@@ -69,7 +69,107 @@ This will show you a list of all models that can be run locally, including their
## 🧑💻 Integrate with your applications using the SDK
-Foundry Local has an easy-to-use SDK (Python, JavaScript) to get you started with existing applications:
+Foundry Local has an easy-to-use SDK (C#, Python, JavaScript) to get you started with existing applications:
+
+### C#
+
+The C# SDK is available as a package on NuGet. You can install it using the .NET CLI:
+
+```bash
+dotnet add package Microsoft.AI.Foundry.Local.WinML
+```
+
+> [!TIP]
+> The C# SDK does not require end users to have Foundry Local CLI installed. It is a completely self-contained SDK that will does not depend on any external services. Also, the C# SDK has native in-process Chat Completions and Audio Transcription APIs that do not require HTTP calls to the local Foundry service.
+
+Here is an example of using the C# SDK to run a model and generate a chat completion:
+
+```csharp
+using Microsoft.AI.Foundry.Local;
+using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
+using Microsoft.Extensions.Logging;
+
+CancellationToken ct = new CancellationToken();
+
+var config = new Configuration
+{
+ AppName = "my-app-name",
+ LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Debug
+};
+
+using var loggerFactory = LoggerFactory.Create(builder =>
+{
+ builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Debug);
+});
+var logger = loggerFactory.CreateLogger();
+
+// Initialize the singleton instance.
+await FoundryLocalManager.CreateAsync(config, logger);
+var mgr = FoundryLocalManager.Instance;
+
+// Get the model catalog
+var catalog = await mgr.GetCatalogAsync();
+
+// List available models
+Console.WriteLine("Available models for your hardware:");
+var models = await catalog.ListModelsAsync();
+foreach (var availableModel in models)
+{
+ foreach (var variant in availableModel.Variants)
+ {
+ Console.WriteLine($" - Alias: {variant.Alias} (Id: {string.Join(", ", variant.Id)})");
+ }
+}
+
+// Get a model using an alias
+var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");
+
+
+// is model cached
+Console.WriteLine($"Is model cached: {await model.IsCachedAsync()}");
+
+// print out cached models
+var cachedModels = await catalog.GetCachedModelsAsync();
+Console.WriteLine("Cached models:");
+foreach (var cachedModel in cachedModels)
+{
+ Console.WriteLine($"- {cachedModel.Alias} ({cachedModel.Id})");
+}
+
+// Download the model (the method skips download if already cached)
+await model.DownloadAsync(progress =>
+{
+ Console.Write($"\rDownloading model: {progress:F2}%");
+ if (progress >= 100f)
+ {
+ Console.WriteLine();
+ }
+});
+
+// Load the model
+await model.LoadAsync();
+
+// Get a chat client
+var chatClient = await model.GetChatClientAsync();
+
+// Create a chat message
+List messages = new()
+{
+ new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
+};
+
+var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
+await foreach (var chunk in streamingResponse)
+{
+ Console.Write(chunk.Choices[0].Message.Content);
+ Console.Out.Flush();
+}
+Console.WriteLine();
+
+// Tidy up - unload the model
+await model.UnloadAsync();
+```
+
### Python
diff --git a/docs/README.md b/docs/README.md
index 4020a58..ee8f846 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,96 +1,7 @@
-# Get Started with Foundry Local
+# Foundry Local Documentation
-This guide provides detailed instructions on installing, configuring, and using Foundry Local to run AI models on your device.
+The Foundry Local documentation is provided on [Microsoft Learn](https://learn.microsoft.com/azure/ai-foundry/foundry-local/) where you can find comprehensive guides and tutorials to help you get started and make the most of the Foundry Local.
-## Prerequisites
+## API Reference
-- A PC with sufficient specifications to run AI models locally
- - Windows 10 or later
- - Greater than 8GB RAM
- - Greater than 10GB of free disk space for model caching (quantized Phi 3.2 models are ~3GB)
-- Suggested hardware for optimal performance:
- - Windows 11
- - NVIDIA GPU (2000 series or newer) OR AMD GPU (6000 series or newer) OR Qualcomm Snapdragon X Elite, with 8GB or more of VRAM
- - Greater than 16GB RAM
- - Greater than 20GB of free disk space for model caching (the largest models are ~15GB)
-- Administrator access to install software
-
-## Installation
-
-1. Download Foundry Local for your platform from the [releases page](https://github.com/microsoft/Foundry-Local/releases).
-2. Install the package by following the on-screen prompts.
-3. After installation, access the tool via command line with `foundry`.
-
-## Running Your First Model
-
-1. Open a command prompt or terminal window.
-2. Run a model using the following command:
-
- ```bash
- foundry model run phi-3.5-mini
- ```
-
- This command will:
-
- - Download the model to your local disk
- - Load the model into your device
- - Start a chat interface
-
-**💡 TIP:** Replace `phi-3.5-mini` with any model from the catalog. Use `foundry model list` to see available models.
-
-## Explore Foundry Local CLI commands
-
-The foundry CLI is structured into several categories:
-
-- **Model**: Commands related to managing and running models
-- **Service**: Commands for managing the Foundry Local service
-- **Cache**: Commands for managing the local cache where models are stored
-
-To see all available commands, use the help option:
-
-```bash
-foundry --help
-```
-
-**💡 TIP:** For a complete reference of all available CLI commands and their usage, see the [Foundry Local CLI Reference](./reference/reference-cli.md)
-
-## Integrating with Applications
-
-Foundry Local provides an OpenAI-compatible REST API at `http://localhost:PORT/v1`.
-
-- Note that the port will be dynamically assigned, so check the logs for the correct port.
-
-### REST API Example
-
-```bash
-curl http://localhost:5273/v1/chat/completions \
- -H "Content-Type: application/json" \
- -d '{
- "model": "Phi-3.5-mini-instruct-generic-cpu",
- "messages": [{"role": "user", "content": "What is the capital of France?"}],
- "temperature": 0.7,
- "max_tokens": 50
- }'
-```
-
-Read about all the samples we have for various languages and platforms in the [Integrate with Inference SDKs](./how-to/integrate-with-inference-sdks.md) section.
-
-## Troubleshooting
-
-### Common Issues and Solutions
-
-| Issue | Possible Cause | Solution |
-| ----------------------- | --------------------------------------- | ----------------------------------------------------------------------------------------- |
-| Slow inference | CPU-only model on large parameter count | Use GPU-optimized model variants when available |
-| Model download failures | Network connectivity issues | Check your internet connection, try `foundry cache list` to verify cache state |
-| Service won't start | Port conflicts or permission issues | Try `foundry service restart` or post an issue providing logs with `foundry zip-logsrock` |
-
-For more information, see the [troubleshooting guide](./reference/reference-troubleshooting.md).
-
-## Next Steps
-
-- [Learn more about Foundry Local](./what-is-foundry-local.md)
-- [Integrate with inferencing SDKs](./how-to/integrate-with-inference-sdks.md)
-- [Compile models for Foundry Local](./how-to/compile-models-for-foundry-local.md)
-- [Build a chat application](./tutorials/chat-application-with-open-web-ui.md)
-- [Use Langchain](./tutorials/use-langchain-with-foundry-local.md)
+- [Foundry Local C# SDK API Reference](./cs-api/Microsoft.AI.Foundry.Local.md)
\ No newline at end of file
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.Configuration.WebService.md b/docs/cs-api/Microsoft.AI.Foundry.Local.Configuration.WebService.md
new file mode 100644
index 0000000..2a7893c
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.Configuration.WebService.md
@@ -0,0 +1,56 @@
+# Class Configuration.WebService
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Configuration settings if the optional web service is used.
+
+```csharp
+public class Configuration.WebService
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[Configuration.WebService](Microsoft.AI.Foundry.Local.Configuration.WebService.md)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### ExternalUrl
+
+If the web service is running in a separate process, it will be accessed using this URI.
+Both processes should be using the same version of the SDK. If a random port is assigned when creating
+the web service in the external process the actual port must be provided here.
+
+```csharp
+public Uri? ExternalUrl { get; init; }
+```
+
+#### Property Value
+
+ [Uri](https://learn.microsoft.com/dotnet/api/system.uri)?
+
+### Urls
+
+Url/s to bind to the web service when is called.
+After startup, will contain the actual URL/s the service is listening on.
+Default: 127.0.0.1:0, which binds to a random ephemeral port. Multiple URLs can be specified as a semi-colon separated list.
+
+```csharp
+public string? Urls { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.Configuration.md b/docs/cs-api/Microsoft.AI.Foundry.Local.Configuration.md
new file mode 100644
index 0000000..932755e
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.Configuration.md
@@ -0,0 +1,119 @@
+# Class Configuration
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Foundry Local configuration used to initialize the singleton.
+
+```csharp
+public class Configuration
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[Configuration](Microsoft.AI.Foundry.Local.Configuration.md)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### AdditionalSettings
+
+Additional settings that Foundry Local Core can consume.
+Keys and values are strings.
+
+```csharp
+public IDictionary? AdditionalSettings { get; init; }
+```
+
+#### Property Value
+
+ [IDictionary](https://learn.microsoft.com/dotnet/api/system.collections.generic.idictionary\-2)<[string](https://learn.microsoft.com/dotnet/api/system.string), [string](https://learn.microsoft.com/dotnet/api/system.string)\>?
+
+### AppDataDir
+
+Application data directory.
+Default: {home}/.{appname}, where {home} is the user's home directory and {appname} is the AppName value.
+
+```csharp
+public string? AppDataDir { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### AppName
+
+Your application name. MUST be set to a valid name.
+
+```csharp
+public required string AppName { get; set; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### LogLevel
+
+Logging level.
+Valid values are: Verbose, Debug, Information, Warning, Error, Fatal.
+Default: .
+
+```csharp
+public LogLevel LogLevel { get; init; }
+```
+
+#### Property Value
+
+ [LogLevel](Microsoft.AI.Foundry.Local.LogLevel.md)
+
+### LogsDir
+
+Log directory.
+Default: {appdata}/logs
+
+```csharp
+public string? LogsDir { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### ModelCacheDir
+
+Model cache directory.
+Default: {appdata}/cache/models, where {appdata} is the AppDataDir value.
+
+```csharp
+public string? ModelCacheDir { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### Web
+
+Optional configuration for the built-in web service.
+NOTE: This is not included in all builds.
+
+```csharp
+public Configuration.WebService? Web { get; init; }
+```
+
+#### Property Value
+
+ [Configuration](Microsoft.AI.Foundry.Local.Configuration.md).[WebService](Microsoft.AI.Foundry.Local.Configuration.WebService.md)?
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.DeviceType.md b/docs/cs-api/Microsoft.AI.Foundry.Local.DeviceType.md
new file mode 100644
index 0000000..237d7ad
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.DeviceType.md
@@ -0,0 +1,38 @@
+# Enum DeviceType
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Device types supported by the runtime for model execution.
+
+```csharp
+[JsonConverter(typeof(JsonStringEnumConverter))]
+public enum DeviceType
+```
+
+## Fields
+
+`CPU = 1`
+
+Standard system CPU.
+
+
+
+`GPU = 2`
+
+Discrete or integrated GPU device.
+
+
+
+`Invalid = 0`
+
+Invalid / unspecified device type.
+
+
+
+`NPU = 3`
+
+Neural Processing Unit.
+
+
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.FoundryLocalException.md b/docs/cs-api/Microsoft.AI.Foundry.Local.FoundryLocalException.md
new file mode 100644
index 0000000..abd421d
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.FoundryLocalException.md
@@ -0,0 +1,78 @@
+# Class FoundryLocalException
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Exception type thrown by the Foundry Local SDK to represent operational or initialization errors.
+
+```csharp
+public class FoundryLocalException : Exception, ISerializable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[Exception](https://learn.microsoft.com/dotnet/api/system.exception) ←
+[FoundryLocalException](Microsoft.AI.Foundry.Local.FoundryLocalException.md)
+
+#### Implements
+
+[ISerializable](https://learn.microsoft.com/dotnet/api/system.runtime.serialization.iserializable)
+
+#### Inherited Members
+
+[Exception.GetBaseException\(\)](https://learn.microsoft.com/dotnet/api/system.exception.getbaseexception),
+[Exception.GetObjectData\(SerializationInfo, StreamingContext\)](https://learn.microsoft.com/dotnet/api/system.exception.getobjectdata),
+[Exception.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.exception.gettype),
+[Exception.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.exception.tostring),
+[Exception.Data](https://learn.microsoft.com/dotnet/api/system.exception.data),
+[Exception.HelpLink](https://learn.microsoft.com/dotnet/api/system.exception.helplink),
+[Exception.HResult](https://learn.microsoft.com/dotnet/api/system.exception.hresult),
+[Exception.InnerException](https://learn.microsoft.com/dotnet/api/system.exception.innerexception),
+[Exception.Message](https://learn.microsoft.com/dotnet/api/system.exception.message),
+[Exception.Source](https://learn.microsoft.com/dotnet/api/system.exception.source),
+[Exception.StackTrace](https://learn.microsoft.com/dotnet/api/system.exception.stacktrace),
+[Exception.TargetSite](https://learn.microsoft.com/dotnet/api/system.exception.targetsite),
+[Exception.SerializeObjectState](https://learn.microsoft.com/dotnet/api/system.exception.serializeobjectstate),
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Constructors
+
+### FoundryLocalException\(string\)
+
+Create a new .
+
+```csharp
+public FoundryLocalException(string message)
+```
+
+#### Parameters
+
+`message` [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+Error message.
+
+### FoundryLocalException\(string, Exception\)
+
+Create a new with an inner exception.
+
+```csharp
+public FoundryLocalException(string message, Exception innerException)
+```
+
+#### Parameters
+
+`message` [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+Error message.
+
+`innerException` [Exception](https://learn.microsoft.com/dotnet/api/system.exception)
+
+Underlying exception.
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.FoundryLocalManager.md b/docs/cs-api/Microsoft.AI.Foundry.Local.FoundryLocalManager.md
new file mode 100644
index 0000000..0a5dd8c
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.FoundryLocalManager.md
@@ -0,0 +1,212 @@
+# Class FoundryLocalManager
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Entry point for Foundry Local SDK providing initialization, catalog access, model management
+and optional web service hosting.
+
+```csharp
+public class FoundryLocalManager : IDisposable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[FoundryLocalManager](Microsoft.AI.Foundry.Local.FoundryLocalManager.md)
+
+#### Implements
+
+[IDisposable](https://learn.microsoft.com/dotnet/api/system.idisposable)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Instance
+
+Singleton instance. Must call before use.
+
+```csharp
+public static FoundryLocalManager Instance { get; }
+```
+
+#### Property Value
+
+ [FoundryLocalManager](Microsoft.AI.Foundry.Local.FoundryLocalManager.md)
+
+### IsInitialized
+
+Has the manager been successfully initialized?
+
+```csharp
+public static bool IsInitialized { get; }
+```
+
+#### Property Value
+
+ [bool](https://learn.microsoft.com/dotnet/api/system.boolean)
+
+### Urls
+
+Bound Urls if the web service has been started. Null otherwise.
+See .
+
+```csharp
+public string[]? Urls { get; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)\[\]?
+
+## Methods
+
+### CreateAsync\(Configuration, ILogger, CancellationToken?\)
+
+Create the singleton instance.
+
+```csharp
+public static Task CreateAsync(Configuration configuration, ILogger logger, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`configuration` [Configuration](Microsoft.AI.Foundry.Local.Configuration.md)
+
+Configuration to use.
+
+`logger` [ILogger](https://learn.microsoft.com/dotnet/api/microsoft.extensions.logging.ilogger)
+
+Application logger to use.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token for the initialization.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task creating the instance.
+
+### Dispose\(bool\)
+
+Dispose managed resources held by the manager.
+
+```csharp
+protected virtual void Dispose(bool disposing)
+```
+
+#### Parameters
+
+`disposing` [bool](https://learn.microsoft.com/dotnet/api/system.boolean)
+
+True when called from .
+
+### Dispose\(\)
+
+Dispose the manager instance.
+
+```csharp
+public void Dispose()
+```
+
+### EnsureEpsDownloadedAsync\(CancellationToken?\)
+
+Ensure execution providers are downloaded and registered (For Microsoft.AI.Foundry.Local.WinML package).
+Subsequent calls are fast after initial download.
+
+```csharp
+public Task EnsureEpsDownloadedAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+### GetCatalogAsync\(CancellationToken?\)
+
+Get the model catalog instance. Populated on first use.
+
+```csharp
+public Task GetCatalogAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[ICatalog](Microsoft.AI.Foundry.Local.ICatalog.md)\>
+
+The model catalog.
+
+#### Remarks
+
+If using Microsoft.AI.Foundry.Local.WinML this will trigger execution provider download if not already done.
+If the execution provider is already downloaded and up-to-date then this operation is fast. You can call
+ first to separate these operations - for example, during
+application startup.
+
+### StartWebServiceAsync\(CancellationToken?\)
+
+Start the optional web service exposing OpenAI compatible endpoints that supports:
+ /v1/chat/completions
+ /v1/audio/transcriptions
+ /v1/models
+ /v1/models/{model_id}
+
+```csharp
+public Task StartWebServiceAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task completing once service started.
+
+### StopWebServiceAsync\(CancellationToken?\)
+
+Stops the web service if started.
+
+```csharp
+public Task StopWebServiceAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task completing once service stopped.
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.ICatalog.md b/docs/cs-api/Microsoft.AI.Foundry.Local.ICatalog.md
new file mode 100644
index 0000000..1eaa880
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.ICatalog.md
@@ -0,0 +1,133 @@
+# Interface ICatalog
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+```csharp
+public interface ICatalog
+```
+
+## Properties
+
+### Name
+
+The catalog name.
+
+```csharp
+string Name { get; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+## Methods
+
+### GetCachedModelsAsync\(CancellationToken?\)
+
+Get the list of currently downloaded models available in the local cache.
+
+```csharp
+Task> GetCachedModelsAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional CancellationToken.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[List](https://learn.microsoft.com/dotnet/api/system.collections.generic.list\-1)<[ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)\>\>
+
+List of ModelVariant instances.
+
+### GetLoadedModelsAsync\(CancellationToken?\)
+
+Get a list of the currently loaded models.
+
+```csharp
+Task> GetLoadedModelsAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional CancellationToken.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[List](https://learn.microsoft.com/dotnet/api/system.collections.generic.list\-1)<[ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)\>\>
+
+List of ModelVariant instances.
+
+### GetModelAsync\(string, CancellationToken?\)
+
+Lookup a model by its alias.
+
+```csharp
+Task GetModelAsync(string modelAlias, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`modelAlias` [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+Model alias.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional CancellationToken.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[Model](Microsoft.AI.Foundry.Local.Model.md)?\>
+
+Model if found.
+
+### GetModelVariantAsync\(string, CancellationToken?\)
+
+Lookup a model variant by its unique model id.
+
+```csharp
+Task GetModelVariantAsync(string modelId, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`modelId` [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+Model id.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional CancellationToken.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)?\>
+
+Model variant if found.
+
+### ListModelsAsync\(CancellationToken?\)
+
+List the available models in the catalog.
+
+```csharp
+Task> ListModelsAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional CancellationToken.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[List](https://learn.microsoft.com/dotnet/api/system.collections.generic.list\-1)<[Model](Microsoft.AI.Foundry.Local.Model.md)\>\>
+
+List of Model instances.
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.IModel.md b/docs/cs-api/Microsoft.AI.Foundry.Local.IModel.md
new file mode 100644
index 0000000..189f7a5
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.IModel.md
@@ -0,0 +1,211 @@
+# Interface IModel
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Common operations for a model variant or model abstraction including caching, loading
+and client creation helpers.
+
+```csharp
+public interface IModel
+```
+
+## Properties
+
+### Alias
+
+```csharp
+[SuppressMessage("Naming", "CA1716:Identifiers should not match keywords", Justification = "Alias is a suitable name in this context.")]
+string Alias { get; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Id
+
+Unique model identifier.
+
+```csharp
+string Id { get; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+## Methods
+
+### DownloadAsync\(Action?, CancellationToken?\)
+
+Download the model files from the catalog.
+
+```csharp
+Task DownloadAsync(Action? downloadProgress = null, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`downloadProgress` [Action](https://learn.microsoft.com/dotnet/api/system.action\-1)<[float](https://learn.microsoft.com/dotnet/api/system.single)\>?
+
+Optional progress callback called on a separate thread that
+ reports download progress in percent (float), with values ending in 100 (percent). When download is complete and all callbacks
+ have been made, the Task for the download completes.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task representing the asynchronous operation.
+
+### GetAudioClientAsync\(CancellationToken?\)
+
+Get an OpenAI API based AudioClient.
+
+```csharp
+Task GetAudioClientAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[OpenAIAudioClient](Microsoft.AI.Foundry.Local.OpenAIAudioClient.md)\>
+
+ instance.
+
+### GetChatClientAsync\(CancellationToken?\)
+
+Get an OpenAI API based ChatClient.
+
+```csharp
+Task GetChatClientAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[OpenAIChatClient](Microsoft.AI.Foundry.Local.OpenAIChatClient.md)\>
+
+ instance.
+
+### GetPathAsync\(CancellationToken?\)
+
+Gets the model path if cached.
+
+```csharp
+Task GetPathAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[string](https://learn.microsoft.com/dotnet/api/system.string)\>
+
+Path of model directory.
+
+### IsCachedAsync\(CancellationToken?\)
+
+Is the model cached on the local filesystem?
+
+```csharp
+Task IsCachedAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[bool](https://learn.microsoft.com/dotnet/api/system.boolean)\>
+
+### IsLoadedAsync\(CancellationToken?\)
+
+Is the model currently loaded in memory?
+
+```csharp
+Task IsLoadedAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[bool](https://learn.microsoft.com/dotnet/api/system.boolean)\>
+
+### LoadAsync\(CancellationToken?\)
+
+Load the model into memory if not already loaded.
+
+```csharp
+Task LoadAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+### RemoveFromCacheAsync\(CancellationToken?\)
+
+Remove the model from the local cache.
+
+```csharp
+Task RemoveFromCacheAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+### UnloadAsync\(CancellationToken?\)
+
+Unload the model if loaded.
+
+```csharp
+Task UnloadAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.LogLevel.md b/docs/cs-api/Microsoft.AI.Foundry.Local.LogLevel.md
new file mode 100644
index 0000000..894f44f
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.LogLevel.md
@@ -0,0 +1,50 @@
+# Enum LogLevel
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Logging verbosity levels used by the Foundry Local SDK. These levels align with Serilog (Verbose, Debug, Information, Warning, Error, Fatal)
+and differ from Microsoft.Extensions.Logging.LogLevel, which includes Trace, Critical, and None.
+
+```csharp
+public enum LogLevel
+```
+
+## Fields
+
+`Debug = 1`
+
+Debug level diagnostic messages.
+
+
+
+`Error = 4`
+
+Recoverable error events.
+
+
+
+`Fatal = 5`
+
+Critical errors indicating severe issues.
+
+
+
+`Information = 2`
+
+Information messages describing normal operations.
+
+
+
+`Verbose = 0`
+
+Highly verbose diagnostic output.
+
+
+
+`Warning = 3`
+
+Warning events indicating potential issues.
+
+
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.Model.md b/docs/cs-api/Microsoft.AI.Foundry.Local.Model.md
new file mode 100644
index 0000000..e8b00db
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.Model.md
@@ -0,0 +1,306 @@
+# Class Model
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Represents a family of related model variants (versions or configurations) that share a common alias.
+Acts as a façade over its variants, letting you:
+ - enumerate and select a specific variant
+ - prefer a locally cached variant automatically
+ - resolve the latest version of a given variant
+ - download, load, unload, cache removal for the currently selected variant
+ - create chat and audio clients for the currently selected variant.
+Use when you need per‑variant metadata; use when you want alias‑level orchestration.
+
+```csharp
+public class Model : IModel
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[Model](Microsoft.AI.Foundry.Local.Model.md)
+
+#### Implements
+
+[IModel](Microsoft.AI.Foundry.Local.IModel.md)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Alias
+
+Model alias grouping multiple device-specific variants of the same underlying model.
+
+```csharp
+public string Alias { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Id
+
+Unique Id of the currently selected variant.
+
+```csharp
+public string Id { get; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### SelectedVariant
+
+Currently selected variant used for IModel operations.
+
+```csharp
+public ModelVariant SelectedVariant { get; }
+```
+
+#### Property Value
+
+ [ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)
+
+### Variants
+
+All known variants for this model alias.
+
+```csharp
+public List Variants { get; }
+```
+
+#### Property Value
+
+ [List](https://learn.microsoft.com/dotnet/api/system.collections.generic.list\-1)<[ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)\>
+
+## Methods
+
+### DownloadAsync\(Action?, CancellationToken?\)
+
+Download the model files from the catalog.
+
+```csharp
+public Task DownloadAsync(Action? downloadProgress = null, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`downloadProgress` [Action](https://learn.microsoft.com/dotnet/api/system.action\-1)<[float](https://learn.microsoft.com/dotnet/api/system.single)\>?
+
+Optional progress callback called on a separate thread that
+ reports download progress in percent (float), with values ending in 100 (percent). When download is complete and all callbacks
+ have been made, the Task for the download completes.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task representing the asynchronous operation.
+
+### GetAudioClientAsync\(CancellationToken?\)
+
+Get an OpenAI API based AudioClient.
+
+```csharp
+public Task GetAudioClientAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[OpenAIAudioClient](Microsoft.AI.Foundry.Local.OpenAIAudioClient.md)\>
+
+ instance.
+
+### GetChatClientAsync\(CancellationToken?\)
+
+Get an OpenAI API based ChatClient.
+
+```csharp
+public Task GetChatClientAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[OpenAIChatClient](Microsoft.AI.Foundry.Local.OpenAIChatClient.md)\>
+
+ instance.
+
+### GetLatestVersion\(ModelVariant\)
+
+Get the latest version of the specified model variant.
+
+```csharp
+public ModelVariant GetLatestVersion(ModelVariant variant)
+```
+
+#### Parameters
+
+`variant` [ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)
+
+Model variant.
+
+#### Returns
+
+ [ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)
+
+ModelVariant for latest version. Same as variant if that is the latest version.
+
+#### Exceptions
+
+ [FoundryLocalException](Microsoft.AI.Foundry.Local.FoundryLocalException.md)
+
+If variant is not valid for this model.
+
+### GetPathAsync\(CancellationToken?\)
+
+Gets the model path if cached.
+
+```csharp
+public Task GetPathAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[string](https://learn.microsoft.com/dotnet/api/system.string)\>
+
+Path of model directory.
+
+### IsCachedAsync\(CancellationToken?\)
+
+Is the currently selected variant cached locally?
+
+```csharp
+public Task IsCachedAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[bool](https://learn.microsoft.com/dotnet/api/system.boolean)\>
+
+### IsLoadedAsync\(CancellationToken?\)
+
+Is the currently selected variant loaded in memory?
+
+```csharp
+public Task IsLoadedAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[bool](https://learn.microsoft.com/dotnet/api/system.boolean)\>
+
+### LoadAsync\(CancellationToken?\)
+
+Load the model into memory if not already loaded.
+
+```csharp
+public Task LoadAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+### RemoveFromCacheAsync\(CancellationToken?\)
+
+Remove the model from the local cache.
+
+```csharp
+public Task RemoveFromCacheAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+### SelectVariant\(ModelVariant\)
+
+Select a specific model variant by its unique model ID.
+The selected variant will be used for operations.
+
+```csharp
+public void SelectVariant(ModelVariant variant)
+```
+
+#### Parameters
+
+`variant` [ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)
+
+#### Exceptions
+
+ [FoundryLocalException](Microsoft.AI.Foundry.Local.FoundryLocalException.md)
+
+If variant is not valid for this model.
+
+### UnloadAsync\(CancellationToken?\)
+
+Unload the model if loaded.
+
+```csharp
+public Task UnloadAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.ModelInfo.md b/docs/cs-api/Microsoft.AI.Foundry.Local.ModelInfo.md
new file mode 100644
index 0000000..cdb62cd
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.ModelInfo.md
@@ -0,0 +1,279 @@
+# Class ModelInfo
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Full descriptive metadata for a model variant within the catalog.
+
+```csharp
+public record ModelInfo : IEquatable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[ModelInfo](Microsoft.AI.Foundry.Local.ModelInfo.md)
+
+#### Implements
+
+[IEquatable](https://learn.microsoft.com/dotnet/api/system.iequatable\-1)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Cached
+
+Indicates whether the model artifacts are currently cached locally.
+
+```csharp
+[JsonPropertyName("cached")]
+public bool Cached { get; init; }
+```
+
+#### Property Value
+
+ [bool](https://learn.microsoft.com/dotnet/api/system.boolean)
+
+### CreatedAtUnix
+
+Unix timestamp (seconds) when the model was added to the catalog.
+
+```csharp
+[JsonPropertyName("createdAt")]
+public long CreatedAtUnix { get; init; }
+```
+
+#### Property Value
+
+ [long](https://learn.microsoft.com/dotnet/api/system.int64)
+
+### DisplayName
+
+Friendly display name.
+
+```csharp
+[JsonPropertyName("displayName")]
+public string? DisplayName { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### FileSizeMb
+
+Approximate size of the model artifacts in megabytes.
+
+```csharp
+[JsonPropertyName("fileSizeMb")]
+public int? FileSizeMb { get; init; }
+```
+
+#### Property Value
+
+ [int](https://learn.microsoft.com/dotnet/api/system.int32)?
+
+### Id
+
+Globally unique model identifier.
+
+```csharp
+[JsonPropertyName("id")]
+public required string Id { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### License
+
+Short license identifier or name associated with this model.
+
+```csharp
+[JsonPropertyName("license")]
+public string? License { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### LicenseDescription
+
+Extended license description, terms, or URL providing more license details.
+
+```csharp
+[JsonPropertyName("licenseDescription")]
+public string? LicenseDescription { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### MaxOutputTokens
+
+Maximum supported output tokens for generation.
+
+```csharp
+[JsonPropertyName("maxOutputTokens")]
+public long? MaxOutputTokens { get; init; }
+```
+
+#### Property Value
+
+ [long](https://learn.microsoft.com/dotnet/api/system.int64)?
+
+### MinFLVersion
+
+Minimum required Foundry Local CLI version for this model.
+
+```csharp
+[JsonPropertyName("minFLVersion")]
+public string? MinFLVersion { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### ModelSettings
+
+Optional settings applied to this model variant (e.g. default parameter values).
+
+```csharp
+[JsonPropertyName("modelSettings")]
+public ModelSettings? ModelSettings { get; init; }
+```
+
+#### Property Value
+
+ [ModelSettings](Microsoft.AI.Foundry.Local.ModelSettings.md)?
+
+### ModelType
+
+The model type for example, ONNX.
+
+```csharp
+[JsonPropertyName("modelType")]
+public required string ModelType { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Name
+
+Internal model name (typically includes size / architecture).
+
+```csharp
+[JsonPropertyName("name")]
+public required string Name { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### PromptTemplate
+
+Optional prompt template guidance for this model.
+
+```csharp
+[JsonPropertyName("promptTemplate")]
+public PromptTemplate? PromptTemplate { get; init; }
+```
+
+#### Property Value
+
+ [PromptTemplate](Microsoft.AI.Foundry.Local.PromptTemplate.md)?
+
+### ProviderType
+
+Either AzureFoundry (model from Catalog) or Local (model from local filesystem but not found in the catalog).
+
+```csharp
+[JsonPropertyName("providerType")]
+public required string ProviderType { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Publisher
+
+Publisher or organization name.
+
+```csharp
+[JsonPropertyName("publisher")]
+public string? Publisher { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### Runtime
+
+Runtime configuration details (device, execution provider) for executing the model.
+
+```csharp
+[JsonPropertyName("runtime")]
+public Runtime? Runtime { get; init; }
+```
+
+#### Property Value
+
+ [Runtime](Microsoft.AI.Foundry.Local.Runtime.md)?
+
+### SupportsToolCalling
+
+Indicates if the model supports tool/function calling capabilities.
+
+```csharp
+[JsonPropertyName("supportsToolCalling")]
+public bool? SupportsToolCalling { get; init; }
+```
+
+#### Property Value
+
+ [bool](https://learn.microsoft.com/dotnet/api/system.boolean)?
+
+### Task
+
+Primary task this model is intended for (e.g. text-generation, embeddings, speech-to-text).
+
+```csharp
+[JsonPropertyName("task")]
+public string? Task { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### Uri
+
+Source URI for the model artifacts.
+
+```csharp
+[JsonPropertyName("uri")]
+public required string Uri { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.ModelSettings.md b/docs/cs-api/Microsoft.AI.Foundry.Local.ModelSettings.md
new file mode 100644
index 0000000..accb989
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.ModelSettings.md
@@ -0,0 +1,45 @@
+# Class ModelSettings
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Optional settings applied to a model instance (e.g. default parameters).
+
+```csharp
+public record ModelSettings : IEquatable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[ModelSettings](Microsoft.AI.Foundry.Local.ModelSettings.md)
+
+#### Implements
+
+[IEquatable](https://learn.microsoft.com/dotnet/api/system.iequatable\-1)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Parameters
+
+Collection of parameters for the model or null if none are defined.
+
+```csharp
+[JsonPropertyName("parameters")]
+public Parameter[]? Parameters { get; set; }
+```
+
+#### Property Value
+
+ [Parameter](Microsoft.AI.Foundry.Local.Parameter.md)\[\]?
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.ModelVariant.md b/docs/cs-api/Microsoft.AI.Foundry.Local.ModelVariant.md
new file mode 100644
index 0000000..b1adae7
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.ModelVariant.md
@@ -0,0 +1,278 @@
+# Class ModelVariant
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Represents a single, concrete downloadable model instance (a specific version + configuration) identified
+by a unique model Id and grouped under a broader alias shared with other device-specific variants.
+Provides:
+ - Direct access to catalog metadata via
+ - Lifecycle operations (download, load, unload, cache removal)
+ - State queries (cached vs. loaded) independent of other variants
+ - Resolution of the local cache path
+ - Creation of OpenAI‑style chat and audio clients once loaded
+Unlike , which orchestrates multiple variants under an alias, is the
+the one specific model instance.
+All public methods surface consistent error handling through .
+
+```csharp
+public class ModelVariant : IModel
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)
+
+#### Implements
+
+[IModel](Microsoft.AI.Foundry.Local.IModel.md)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Alias
+
+Alias grouping related variants.
+
+```csharp
+public string Alias { get; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Id
+
+Unique model identifier.
+
+```csharp
+public string Id { get; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Info
+
+Metadata record for this variant.
+
+```csharp
+public ModelInfo Info { get; }
+```
+
+#### Property Value
+
+ [ModelInfo](Microsoft.AI.Foundry.Local.ModelInfo.md)
+
+### Version
+
+Parsed version number (falling back to 0 if unavailable).
+
+```csharp
+public int Version { get; init; }
+```
+
+#### Property Value
+
+ [int](https://learn.microsoft.com/dotnet/api/system.int32)
+
+## Methods
+
+### DownloadAsync\(Action?, CancellationToken?\)
+
+Download the model files from the catalog.
+
+```csharp
+public Task DownloadAsync(Action? downloadProgress = null, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`downloadProgress` [Action](https://learn.microsoft.com/dotnet/api/system.action\-1)<[float](https://learn.microsoft.com/dotnet/api/system.single)\>?
+
+Optional progress callback called on a separate thread that
+ reports download progress in percent (float), with values ending in 100 (percent). When download is complete and all callbacks
+ have been made, the Task for the download completes.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task representing the asynchronous operation.
+
+### GetAudioClientAsync\(CancellationToken?\)
+
+Get an OpenAI audio client for the model.
+
+```csharp
+public Task GetAudioClientAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[OpenAIAudioClient](Microsoft.AI.Foundry.Local.OpenAIAudioClient.md)\>
+
+Task that resolves to an OpenAIAudioClient instance.
+
+### GetChatClientAsync\(CancellationToken?\)
+
+Get an OpenAI chat client for the model.
+
+```csharp
+public Task GetChatClientAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[OpenAIChatClient](Microsoft.AI.Foundry.Local.OpenAIChatClient.md)\>
+
+Task that resolves to an OpenAIChatClient instance.
+
+### GetPathAsync\(CancellationToken?\)
+
+Get the file system path where the model is cached.
+
+```csharp
+public Task GetPathAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[string](https://learn.microsoft.com/dotnet/api/system.string)\>
+
+Task that resolves to the model path string.
+
+### IsCachedAsync\(CancellationToken?\)
+
+Check if the model is cached on the file system.
+
+```csharp
+public Task IsCachedAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[bool](https://learn.microsoft.com/dotnet/api/system.boolean)\>
+
+Task that resolves to true if the model is cached, false otherwise.
+
+### IsLoadedAsync\(CancellationToken?\)
+
+Check if the model is currently loaded in the runtime.
+
+```csharp
+public Task IsLoadedAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)<[bool](https://learn.microsoft.com/dotnet/api/system.boolean)\>
+
+Task that resolves to true if the model is loaded, false otherwise.
+
+### LoadAsync\(CancellationToken?\)
+
+Load the model so it is available for inferencing.
+
+```csharp
+public Task LoadAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task representing the asynchronous operation.
+
+### RemoveFromCacheAsync\(CancellationToken?\)
+
+Remove the model files from the cache.
+
+```csharp
+public Task RemoveFromCacheAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task representing the asynchronous operation.
+
+### UnloadAsync\(CancellationToken?\)
+
+Unload the model from the runtime.
+
+```csharp
+public Task UnloadAsync(CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task)
+
+Task representing the asynchronous operation.
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIAudioClient.md b/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIAudioClient.md
new file mode 100644
index 0000000..aa7eb4a
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIAudioClient.md
@@ -0,0 +1,79 @@
+# Class OpenAIAudioClient
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Audio transcription client using an OpenAI compatible API surface implemented using Betalgo.Ranul.OpenAI SDK types.
+Supports transcription of audio files.
+
+```csharp
+public class OpenAIAudioClient
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[OpenAIAudioClient](Microsoft.AI.Foundry.Local.OpenAIAudioClient.md)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Methods
+
+### TranscribeAudioAsync\(string, CancellationToken?\)
+
+Transcribe audio from a file.
+
+```csharp
+public Task TranscribeAudioAsync(string audioFilePath, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`audioFilePath` [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+Path to the file containing audio recording.
+Supported formats include mp3, wav and flac.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)
+
+Transcription response.
+
+### TranscribeAudioStreamingAsync\(string, CancellationToken\)
+
+Transcribe audio from a file with streamed output.
+
+```csharp
+public IAsyncEnumerable TranscribeAudioStreamingAsync(string audioFilePath, CancellationToken ct)
+```
+
+#### Parameters
+
+`audioFilePath` [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+Path to the file containing audio recording.
+Supported formats depend include mp3, wav and flac.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)
+
+Cancellation token.
+
+#### Returns
+
+ [IAsyncEnumerable](https://learn.microsoft.com/dotnet/api/system.collections.generic.iasyncenumerable\-1)
+
+An asynchronous enumerable of transcription responses.
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIChatClient.ChatSettings.md b/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIChatClient.ChatSettings.md
new file mode 100644
index 0000000..41eff1d
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIChatClient.ChatSettings.md
@@ -0,0 +1,128 @@
+# Class OpenAIChatClient.ChatSettings
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Settings controlling chat completion generation. Only the subset supported by Foundry Local.
+
+```csharp
+public record OpenAIChatClient.ChatSettings : IEquatable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[OpenAIChatClient.ChatSettings](Microsoft.AI.Foundry.Local.OpenAIChatClient.ChatSettings.md)
+
+#### Implements
+
+[IEquatable](https://learn.microsoft.com/dotnet/api/system.iequatable\-1)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### FrequencyPenalty
+
+Penalizes repeated tokens.
+
+```csharp
+public float? FrequencyPenalty { get; set; }
+```
+
+#### Property Value
+
+ [float](https://learn.microsoft.com/dotnet/api/system.single)?
+
+### MaxTokens
+
+Maximum number of output tokens to generate.
+
+```csharp
+public int? MaxTokens { get; set; }
+```
+
+#### Property Value
+
+ [int](https://learn.microsoft.com/dotnet/api/system.int32)?
+
+### N
+
+Number of parallel completions to request.
+
+```csharp
+public int? N { get; set; }
+```
+
+#### Property Value
+
+ [int](https://learn.microsoft.com/dotnet/api/system.int32)?
+
+### PresencePenalty
+
+Penalizes new tokens based on whether they appear in the existing text.
+
+```csharp
+public float? PresencePenalty { get; set; }
+```
+
+#### Property Value
+
+ [float](https://learn.microsoft.com/dotnet/api/system.single)?
+
+### RandomSeed
+
+Optional random seed for deterministic sampling.
+
+```csharp
+public int? RandomSeed { get; set; }
+```
+
+#### Property Value
+
+ [int](https://learn.microsoft.com/dotnet/api/system.int32)?
+
+### Temperature
+
+Sampling temperature. Higher values increase randomness.
+
+```csharp
+public float? Temperature { get; set; }
+```
+
+#### Property Value
+
+ [float](https://learn.microsoft.com/dotnet/api/system.single)?
+
+### TopK
+
+Top-K sampling parameter.
+
+```csharp
+public int? TopK { get; set; }
+```
+
+#### Property Value
+
+ [int](https://learn.microsoft.com/dotnet/api/system.int32)?
+
+### TopP
+
+Top-P (nucleus) sampling parameter.
+
+```csharp
+public float? TopP { get; set; }
+```
+
+#### Property Value
+
+ [float](https://learn.microsoft.com/dotnet/api/system.single)?
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIChatClient.md b/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIChatClient.md
new file mode 100644
index 0000000..021b6ed
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.OpenAIChatClient.md
@@ -0,0 +1,93 @@
+# Class OpenAIChatClient
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Chat client using an OpenAI compatible API surface implemented using Betalgo.Ranul.OpenAI SDK types.
+Provides convenience methods for standard and streaming chat completions.
+
+```csharp
+public class OpenAIChatClient
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[OpenAIChatClient](Microsoft.AI.Foundry.Local.OpenAIChatClient.md)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Settings
+
+Settings to use for chat completions using this client.
+
+```csharp
+public OpenAIChatClient.ChatSettings Settings { get; }
+```
+
+#### Property Value
+
+ [OpenAIChatClient](Microsoft.AI.Foundry.Local.OpenAIChatClient.md).[ChatSettings](Microsoft.AI.Foundry.Local.OpenAIChatClient.ChatSettings.md)
+
+## Methods
+
+### CompleteChatAsync\(IEnumerable, CancellationToken?\)
+
+Execute a chat completion request.
+To continue a conversation, add prior response messages and new prompt to the messages list.
+
+```csharp
+public Task CompleteChatAsync(IEnumerable messages, CancellationToken? ct = null)
+```
+
+#### Parameters
+
+`messages` [IEnumerable](https://learn.microsoft.com/dotnet/api/system.collections.generic.ienumerable\-1)
+
+Chat messages including system / user / assistant roles.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)?
+
+Optional cancellation token.
+
+#### Returns
+
+ [Task](https://learn.microsoft.com/dotnet/api/system.threading.tasks.task\-1)
+
+Chat completion response.
+
+### CompleteChatStreamingAsync\(IEnumerable, CancellationToken\)
+
+Execute a chat completion request with streamed output.
+To continue a conversation, add prior response messages and new prompt to the messages list.
+
+```csharp
+public IAsyncEnumerable CompleteChatStreamingAsync(IEnumerable messages, CancellationToken ct)
+```
+
+#### Parameters
+
+`messages` [IEnumerable](https://learn.microsoft.com/dotnet/api/system.collections.generic.ienumerable\-1)
+
+Chat messages including system / user / assistant roles.
+
+`ct` [CancellationToken](https://learn.microsoft.com/dotnet/api/system.threading.cancellationtoken)
+
+Cancellation token.
+
+#### Returns
+
+ [IAsyncEnumerable](https://learn.microsoft.com/dotnet/api/system.collections.generic.iasyncenumerable\-1)
+
+Async enumerable producing incremental chat completion responses.
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.Parameter.md b/docs/cs-api/Microsoft.AI.Foundry.Local.Parameter.md
new file mode 100644
index 0000000..c9d2342
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.Parameter.md
@@ -0,0 +1,56 @@
+# Class Parameter
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+A single configurable parameter that can influence model behavior.
+
+```csharp
+public record Parameter : IEquatable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[Parameter](Microsoft.AI.Foundry.Local.Parameter.md)
+
+#### Implements
+
+[IEquatable](https://learn.microsoft.com/dotnet/api/system.iequatable\-1)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Name
+
+Parameter name.
+
+```csharp
+public required string Name { get; set; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Value
+
+Optional parameter value as string.
+
+```csharp
+public string? Value { get; set; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.PromptTemplate.md b/docs/cs-api/Microsoft.AI.Foundry.Local.PromptTemplate.md
new file mode 100644
index 0000000..ff77442
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.PromptTemplate.md
@@ -0,0 +1,85 @@
+# Class PromptTemplate
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Template segments used to build a prompt for a model.
+For AzureFoundry model types you do NOT need to populate this; Foundry Local will handle prompt construction automatically.
+
+```csharp
+public record PromptTemplate : IEquatable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[PromptTemplate](Microsoft.AI.Foundry.Local.PromptTemplate.md)
+
+#### Implements
+
+[IEquatable](https://learn.microsoft.com/dotnet/api/system.iequatable\-1)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### Assistant
+
+Assistant response segment used when constructing multi‑turn prompts.
+
+```csharp
+[JsonPropertyName("assistant")]
+public string Assistant { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### Prompt
+
+Raw prompt text passed to the model.
+
+```csharp
+[JsonPropertyName("prompt")]
+public string Prompt { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
+### System
+
+Optional system instruction segment.
+
+```csharp
+[JsonPropertyName("system")]
+public string? System { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
+### User
+
+Optional user message segment.
+
+```csharp
+[JsonPropertyName("user")]
+public string? User { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)?
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.Runtime.md b/docs/cs-api/Microsoft.AI.Foundry.Local.Runtime.md
new file mode 100644
index 0000000..f7ee7e2
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.Runtime.md
@@ -0,0 +1,58 @@
+# Class Runtime
+
+Namespace: [Microsoft.AI.Foundry.Local](Microsoft.AI.Foundry.Local.md)
+Assembly: Microsoft.AI.Foundry.Local.dll
+
+Runtime configuration details describing how the model will execute.
+
+```csharp
+public record Runtime : IEquatable
+```
+
+#### Inheritance
+
+[object](https://learn.microsoft.com/dotnet/api/system.object) ←
+[Runtime](Microsoft.AI.Foundry.Local.Runtime.md)
+
+#### Implements
+
+[IEquatable](https://learn.microsoft.com/dotnet/api/system.iequatable\-1)
+
+#### Inherited Members
+
+[object.Equals\(object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\)),
+[object.Equals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.equals\#system\-object\-equals\(system\-object\-system\-object\)),
+[object.GetHashCode\(\)](https://learn.microsoft.com/dotnet/api/system.object.gethashcode),
+[object.GetType\(\)](https://learn.microsoft.com/dotnet/api/system.object.gettype),
+[object.MemberwiseClone\(\)](https://learn.microsoft.com/dotnet/api/system.object.memberwiseclone),
+[object.ReferenceEquals\(object?, object?\)](https://learn.microsoft.com/dotnet/api/system.object.referenceequals),
+[object.ToString\(\)](https://learn.microsoft.com/dotnet/api/system.object.tostring)
+
+## Properties
+
+### DeviceType
+
+Device type the model will run on (e.g. CPU, GPU, NPU).
+
+```csharp
+[JsonPropertyName("deviceType")]
+public DeviceType DeviceType { get; init; }
+```
+
+#### Property Value
+
+ [DeviceType](Microsoft.AI.Foundry.Local.DeviceType.md)
+
+### ExecutionProvider
+
+Execution provider name (e.g. QNNExecutionProvider, CUDAExecutionProvider, WebGPUExecutionProvider, etc). Open‑ended string.
+
+```csharp
+[JsonPropertyName("executionProvider")]
+public string ExecutionProvider { get; init; }
+```
+
+#### Property Value
+
+ [string](https://learn.microsoft.com/dotnet/api/system.string)
+
diff --git a/docs/cs-api/Microsoft.AI.Foundry.Local.md b/docs/cs-api/Microsoft.AI.Foundry.Local.md
new file mode 100644
index 0000000..3839ab0
--- /dev/null
+++ b/docs/cs-api/Microsoft.AI.Foundry.Local.md
@@ -0,0 +1,101 @@
+# Namespace Microsoft.AI.Foundry.Local
+
+### Classes
+
+ [OpenAIChatClient.ChatSettings](Microsoft.AI.Foundry.Local.OpenAIChatClient.ChatSettings.md)
+
+Settings controlling chat completion generation. Only the subset supported by Foundry Local.
+
+ [Configuration](Microsoft.AI.Foundry.Local.Configuration.md)
+
+Foundry Local configuration used to initialize the singleton.
+
+ [FoundryLocalException](Microsoft.AI.Foundry.Local.FoundryLocalException.md)
+
+Exception type thrown by the Foundry Local SDK to represent operational or initialization errors.
+
+ [FoundryLocalManager](Microsoft.AI.Foundry.Local.FoundryLocalManager.md)
+
+Entry point for Foundry Local SDK providing initialization, catalog access, model management
+and optional web service hosting.
+
+ [Model](Microsoft.AI.Foundry.Local.Model.md)
+
+Represents a family of related model variants (versions or configurations) that share a common alias.
+Acts as a façade over its variants, letting you:
+ - enumerate and select a specific variant
+ - prefer a locally cached variant automatically
+ - resolve the latest version of a given variant
+ - download, load, unload, cache removal for the currently selected variant
+ - create chat and audio clients for the currently selected variant.
+Use when you need per‑variant metadata; use when you want alias‑level orchestration.
+
+ [ModelInfo](Microsoft.AI.Foundry.Local.ModelInfo.md)
+
+Full descriptive metadata for a model variant within the catalog.
+
+ [ModelSettings](Microsoft.AI.Foundry.Local.ModelSettings.md)
+
+Optional settings applied to a model instance (e.g. default parameters).
+
+ [ModelVariant](Microsoft.AI.Foundry.Local.ModelVariant.md)
+
+Represents a single, concrete downloadable model instance (a specific version + configuration) identified
+by a unique model Id and grouped under a broader alias shared with other device-specific variants.
+Provides:
+ - Direct access to catalog metadata via
+ - Lifecycle operations (download, load, unload, cache removal)
+ - State queries (cached vs. loaded) independent of other variants
+ - Resolution of the local cache path
+ - Creation of OpenAI‑style chat and audio clients once loaded
+Unlike , which orchestrates multiple variants under an alias, is the
+the one specific model instance.
+All public methods surface consistent error handling through .
+
+ [OpenAIAudioClient](Microsoft.AI.Foundry.Local.OpenAIAudioClient.md)
+
+Audio transcription client using an OpenAI compatible API surface implemented using Betalgo.Ranul.OpenAI SDK types.
+Supports transcription of audio files.
+
+ [OpenAIChatClient](Microsoft.AI.Foundry.Local.OpenAIChatClient.md)
+
+Chat client using an OpenAI compatible API surface implemented using Betalgo.Ranul.OpenAI SDK types.
+Provides convenience methods for standard and streaming chat completions.
+
+ [Parameter](Microsoft.AI.Foundry.Local.Parameter.md)
+
+A single configurable parameter that can influence model behavior.
+
+ [PromptTemplate](Microsoft.AI.Foundry.Local.PromptTemplate.md)
+
+Template segments used to build a prompt for a model.
+For AzureFoundry model types you do NOT need to populate this; Foundry Local will handle prompt construction automatically.
+
+ [Runtime](Microsoft.AI.Foundry.Local.Runtime.md)
+
+Runtime configuration details describing how the model will execute.
+
+ [Configuration.WebService](Microsoft.AI.Foundry.Local.Configuration.WebService.md)
+
+Configuration settings if the optional web service is used.
+
+### Interfaces
+
+ [ICatalog](Microsoft.AI.Foundry.Local.ICatalog.md)
+
+ [IModel](Microsoft.AI.Foundry.Local.IModel.md)
+
+Common operations for a model variant or model abstraction including caching, loading
+and client creation helpers.
+
+### Enums
+
+ [DeviceType](Microsoft.AI.Foundry.Local.DeviceType.md)
+
+Device types supported by the runtime for model execution.
+
+ [LogLevel](Microsoft.AI.Foundry.Local.LogLevel.md)
+
+Logging verbosity levels used by the Foundry Local SDK. These levels align with Serilog (Verbose, Debug, Information, Warning, Error, Fatal)
+and differ from Microsoft.Extensions.Logging.LogLevel, which includes Trace, Critical, and None.
+
diff --git a/docs/how-to/compile-models-for-foundry-local.md b/docs/how-to/compile-models-for-foundry-local.md
deleted file mode 100644
index fe7667f..0000000
--- a/docs/how-to/compile-models-for-foundry-local.md
+++ /dev/null
@@ -1,278 +0,0 @@
-# How to compile Hugging Face models to run on Foundry Local
-
-Foundry Local runs ONNX models on your device with high performance. While the model catalog offers _out-of-the-box_ precompiled options, you can use any model in the ONNX format.
-
-To compile existing models in Safetensor or PyTorch format into the ONNX format, you can use [Olive](https://microsoft.github.io/Olive). Olive is a tool that optimizes models to ONNX format, making them suitable for deployment in Foundry Local. It uses techniques like _quantization_ and _graph optimization_ to improve performance.
-
-This guide shows you how to:
-
-- **Convert and optimize** models from Hugging Face to run in Foundry Local. You'll use the `Llama-3.2-1B-Instruct` model as an example, but you can use any generative AI model from Hugging Face.
-- **Run** your optimized models with Foundry Local
-
-## Prerequisites
-
-- Python 3.10 or later
-
-## Install Olive
-
-[Olive](https://github.com/microsoft/olive) is a tool that optimizes models to ONNX format.
-
-### Bash
-
-```bash
-pip install olive-ai[auto-opt]
-```
-
-### PowerShell
-
-```powershell
-pip install olive-ai[auto-opt]
-```
-
-**💡 TIP**: For best results, install Olive in a virtual environment using [venv](https://docs.python.org/3/library/venv.html) or [conda](https://www.anaconda.com/docs/getting-started/miniconda/main).
-
-## Sign in to Hugging Face
-
-You optimize the `Llama-3.2-1B-Instruct` model, which requires Hugging Face authentication:
-
-### Bash
-
-```bash
-huggingface-cli login
-```
-
-### PowerShell
-
-```powershell
-huggingface-cli login
-```
-
-**Note**: You must first [create a Hugging Face token](https://huggingface.co/docs/hub/security-tokens) and [request model access](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) before proceeding.
-
-## Compile the model
-
-### Step 1: Run the Olive auto-opt command
-
-Use the Olive `auto-opt` command to download, convert, quantize, and optimize the model:
-
-### Bash
-
-```bash
-olive auto-opt \
- --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
- --trust_remote_code \
- --output_path models/llama \
- --device cpu \
- --provider CPUExecutionProvider \
- --use_ort_genai \
- --precision int4 \
- --log_level 1
-```
-
-### PowerShell
-
-```powershell
-olive auto-opt `
- --model_name_or_path meta-llama/Llama-3.2-1B-Instruct `
- --trust_remote_code `
- --output_path models/llama `
- --device cpu `
- --provider CPUExecutionProvider `
- --use_ort_genai `
- --precision int4 `
- --log_level 1
-```
-
-**Note**: The compilation process takes approximately 60 seconds, plus extra time for model download.
-
-The command uses the following parameters:
-
-| Parameter | Description |
-| -------------------- | --------------------------------------------------------------------------------- |
-| `model_name_or_path` | Model source: Hugging Face ID, local path, or Azure AI Model registry ID |
-| `output_path` | Where to save the optimized model |
-| `device` | Target hardware: `cpu`, `gpu`, or `npu` |
-| `provider` | Execution provider (for example, `CPUExecutionProvider`, `CUDAExecutionProvider`) |
-| `precision` | Model precision: `fp16`, `fp32`, `int4`, or `int8` |
-| `use_ort_genai` | Creates inference configuration files |
-
-**💡 TIP**: If you have a local copy of the model, you can use a local path instead of the Hugging Face ID. For example, `--model_name_or_path models/llama-3.2-1B-Instruct`. Olive handles the conversion, optimization, and quantization automatically.
-
-### Step 2: Rename the output model
-
-Olive places files in a generic `model` directory. Rename it to make it easier to use:
-
-### Bash
-
-```bash
-cd models/llama
-mv model llama-3.2
-```
-
-### PowerShell
-
-```powershell
-cd models/llama
-Rename-Item -Path "model" -NewName "llama-3.2"
-```
-
-### Step 3: Create chat template file
-
-A chat template is a structured format that defines how input and output messages are processed for a conversational AI model. It specifies the roles (for example, system, user, assistant) and the structure of the conversation, ensuring that the model understands the context and generates appropriate responses.
-
-Foundry Local requires a chat template JSON file called `inference_model.json` in order to generate the appropriate responses. The template properties are the model name and a `PromptTemplate` object, which contains a `{Content}` placeholder that Foundry Local injects at runtime with the user prompt.
-
-```json
-{
- "Name": "llama-3.2",
- "PromptTemplate": {
- "assistant": "{Content}",
- "prompt": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{Content}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
- }
-}
-```
-
-To create the chat template file, you can use the `apply_chat_template` method from the Hugging Face library:
-
-**Note**: The following example uses the Python Hugging Face library to create a chat template. The Hugging Face library is a dependency for Olive, so if you're using the same Python virtual environment you don't need to install. If you're using a different environment, install the library with `pip install transformers`.
-
-```python
-# generate_inference_model.py
-# This script generates the inference_model.json file for the Llama-3.2 model.
-import json
-import os
-from transformers import AutoTokenizer
-
-model_path = "models/llama/llama-3.2"
-
-tokenizer = AutoTokenizer.from_pretrained(model_path)
-chat = [
- {"role": "system", "content": "You are a helpful assistant."},
- {"role": "user", "content": "{Content}"},
-]
-
-
-template = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
-
-json_template = {
- "Name": "llama-3.2",
- "PromptTemplate": {
- "assistant": "{Content}",
- "prompt": template
- }
-}
-
-json_file = os.path.join(model_path, "inference_model.json")
-
-with open(json_file, "w") as f:
- json.dump(json_template, f, indent=2)
-```
-
-Run the script using:
-
-```bash
-python generate_inference_model.py
-```
-
-## Run the model
-
-You can run your compiled model using the Foundry Local CLI, REST API, or OpenAI Python SDK. First, change the model cache directory to the models directory you created in the previous step:
-
-### Bash
-
-```bash
-foundry cache cd models
-foundry cache ls # should show llama-3.2
-```
-
-### PowerShell
-
-```powershell
-foundry cache cd models
-foundry cache ls # should show llama-3.2
-```
-
-### Using the Foundry Local CLI
-
-### Bash
-
-```bash
-foundry model run llama-3.2 --verbose
-```
-
-### PowerShell
-
-```powershell
-foundry model run llama-3.2 --verbose
-```
-
-### Using the REST API
-
-- Note that the port will be dynamically assigned, so check the logs for the correct port.
-
-### Bash
-
-```bash
-curl -X POST http://localhost:5273/v1/chat/completions \
--H "Content-Type: application/json" \
--d '{
- "model": "llama-3.2",
- "messages": [{"role": "user", "content": "What is the capital of France?"}],
- "temperature": 0.7,
- "max_tokens": 50,
- "stream": true
-}'
-```
-
-### PowerShell
-
-```powershell
-Invoke-RestMethod -Uri http://localhost:5273/v1/chat/completions `
- -Method Post `
- -ContentType "application/json" `
- -Body '{
- "model": "llama-3.2",
- "messages": [{"role": "user", "content": "What is the capital of France?"}],
- "temperature": 0.7,
- "max_tokens": 50,
- "stream": true
- }'
-```
-
-### Using the OpenAI Python SDK
-
-The OpenAI Python SDK is a convenient way to interact with the Foundry Local REST API. You can install it using:
-
-```bash
-pip install openai
-```
-
-Then, you can use the following code to run the model (changing the port as needed):
-
-```python
-from openai import OpenAI
-
-client = OpenAI(
- base_url="http://localhost:5273/v1",
- api_key="none", # required but not used
-)
-
-stream = client.chat.completions.create(
- model="llama-3.2",
- messages=[{"role": "user", "content": "What is the capital of France?"}],
- temperature=0.7,
- max_tokens=50,
- stream=True,
-)
-
-for event in stream:
- print(event.choices[0].delta.content, end="", flush=True)
-print("\n\n")
-```
-
-**💡 TIP**: You can use any language that supports HTTP requests. See [Integrate with Inferencing SDKs](integrate-with-inference-sdks.md) for more options.
-
-## Next steps
-
-- [Learn more about Olive](https://microsoft.github.io/Olive/)
-- [Integrate Foundry Local with Inferencing SDKs](integrate-with-inference-sdks.md)
diff --git a/docs/how-to/integrate-with-inference-sdks.md b/docs/how-to/integrate-with-inference-sdks.md
deleted file mode 100644
index 0e39ca5..0000000
--- a/docs/how-to/integrate-with-inference-sdks.md
+++ /dev/null
@@ -1,145 +0,0 @@
-# Integrate Foundry Local with Inferencing SDKs
-
-Foundry Local provides a REST API endpoint that makes it easy to integrate with various inferencing SDKs and programming languages. This guide shows you how to connect your applications to locally running AI models using popular SDKs.
-
-## Prerequisites
-
-- Foundry Local installed and running on your system
-- A model loaded into the service (use `foundry model load `)
-- Basic knowledge of the programming language you want to use for integration
-- Development environment for your chosen language
-
-## Understanding the REST API
-
-When Foundry Local is running, it exposes an OpenAI-compatible REST API endpoint at `http://localhost:PORT/v1`. This endpoint supports standard API operations like:
-
-- `/completions` - For text completion
-- `/chat/completions` - For chat-based interactions
-- `/models` - To list available models
-
-This port will be dynamically assigned, so check the logs for the correct port.
-
-## Language Examples
-
-### Python
-
-```python
-from openai import OpenAI
-
-# Configure the client to use your local endpoint
-client = OpenAI(
- base_url="http://localhost:5273/v1",
- api_key="not-needed" # API key isn't used but the client requires one
-)
-
-# Chat completion example
-response = client.chat.completions.create(
- model="Phi-3.5-mini-instruct-generic-cpu", # Use the id of your loaded model, found in 'foundry service ps'
- messages=[
- {"role": "system", "content": "You are a helpful assistant."},
- {"role": "user", "content": "What is the capital of France?"}
- ],
- max_tokens=1000
-)
-
-print(response.choices[0].message.content)
-```
-
-Check out the streaming example [here](../includes/integrate-examples/python.md).
-
-### REST API
-
-```bash
-curl http://localhost:5273/v1/chat/completions \
- -H "Content-Type: application/json" \
- -d '{
- model="Phi-3.5-mini-instruct-generic-cpu",
- "messages": [
- {
- "role": "system",
- "content": "You are a helpful assistant."
- },
- {
- "role": "user",
- "content": "What is the capital of France?"
- }
- ],
- "max_tokens": 1000
- }'
-```
-
-Check out the streaming example [here](../includes/integrate-examples/rest.md).
-
-### JavaScript
-
-```javascript
-import OpenAI from "openai";
-
-// Configure the client to use your local endpoint
-const openai = new OpenAI({
- baseURL: "http://localhost:5273/v1",
- apiKey: "not-needed", // API key isn't used but the client requires one
-});
-
-async function generateText() {
- const response = await openai.chat.completions.create({
- model: "Phi-3.5-mini-instruct-generic-cpu", // Use the id of your loaded model, found in 'foundry service ps'
- messages: [
- { role: "system", content: "You are a helpful assistant." },
- { role: "user", content: "What is the capital of France?" },
- ],
- max_tokens: 1000,
- });
-
- console.log(response.choices[0].message.content);
-}
-
-generateText();
-```
-
-Check out the streaming example [here](../includes/integrate-examples/javascript.md).
-
-### C#
-
-```csharp
-using Azure.AI.OpenAI;
-using Azure;
-
-// Configure the client to use your local endpoint
-OpenAIClient client = new OpenAIClient(
- new Uri("http://localhost:5273/v1"),
- new AzureKeyCredential("not-needed") // API key isn't used but the client requires one
-);
-
-// Chat completion example
-var chatCompletionsOptions = new ChatCompletionsOptions()
-{
- Messages =
- {
- new ChatMessage(ChatRole.System, "You are a helpful assistant."),
- new ChatMessage(ChatRole.User, "What is the capital of France?")
- },
- MaxTokens = 1000
-};
-
-Response response = await client.GetChatCompletionsAsync(
- "Phi-3.5-mini-instruct-generic-cpu", // Use the id of your loaded model, found in 'foundry service ps'
- chatCompletionsOptions
-);
-
-Console.WriteLine(response.Value.Choices[0].Message.Content);
-```
-
-Check out the streaming example [here](../includes/integrate-examples/csharp.md).
-
-## Best Practices
-
-1. **Error Handling**: Implement robust error handling to manage cases when the local service is unavailable or a model isn't loaded.
-2. **Resource Management**: Be mindful of your local resources. Monitor CPU/RAM usage when making multiple concurrent requests.
-3. **Fallback Strategy**: Consider implementing a fallback to cloud services for when local inference is insufficient.
-4. **Model Preloading**: For production applications, ensure your model is preloaded before starting your application.
-
-## Next steps
-
-- [Compile Hugging Face models for Foundry Local](./compile-models-for-foundry-local.md)
-- [Explore the Foundry Local CLI reference](../reference/reference-cli.md)
diff --git a/docs/how-to/manage.md b/docs/how-to/manage.md
deleted file mode 100644
index bf5ddf5..0000000
--- a/docs/how-to/manage.md
+++ /dev/null
@@ -1,15 +0,0 @@
-# Manage Foundry Local
-
-TODO
-
-## Prerequisites
-
-- TODO
-
-## Section
-
-TODO
-
-## Next step
-
-TODO
diff --git a/docs/includes/integrate-examples/csharp.md b/docs/includes/integrate-examples/csharp.md
deleted file mode 100644
index d4c9e1a..0000000
--- a/docs/includes/integrate-examples/csharp.md
+++ /dev/null
@@ -1,66 +0,0 @@
-## Basic Integration
-
-```csharp
-// Install with: dotnet add package Azure.AI.OpenAI
-using Azure.AI.OpenAI;
-using Azure;
-
-// Create a client. Note the port is dynamically assigned, so check the logs for the correct port.
-OpenAIClient client = new OpenAIClient(
- new Uri("http://localhost:5273/v1"),
- new AzureKeyCredential("not-needed-for-local")
-);
-
-// Chat completions
-ChatCompletionsOptions options = new ChatCompletionsOptions()
-{
- Messages =
- {
- new ChatMessage(ChatRole.User, "What is Foundry Local?")
- },
- DeploymentName = "Phi-4-mini-instruct-cuda-gpu" // Use model name here
-};
-
-Response response = await client.GetChatCompletionsAsync(options);
-string completion = response.Value.Choices[0].Message.Content;
-Console.WriteLine(completion);
-```
-
-## Streaming Response
-
-```csharp
-// Install with: dotnet add package Azure.AI.OpenAI
-using Azure.AI.OpenAI;
-using Azure;
-using System;
-using System.Threading.Tasks;
-
-async Task StreamCompletionsAsync()
-{
-// Note the port is dynamically assigned, so check the logs for the correct port.
- OpenAIClient client = new OpenAIClient(
- new Uri("http://localhost:5273/v1"),
- new AzureKeyCredential("not-needed-for-local")
- );
-
- ChatCompletionsOptions options = new ChatCompletionsOptions()
- {
- Messages =
- {
- new ChatMessage(ChatRole.User, "Write a short story about AI")
- },
- DeploymentName = "Phi-4-mini-instruct-cuda-gpu"
- };
-
- await foreach (StreamingChatCompletionsUpdate update in client.GetChatCompletionsStreaming(options))
- {
- if (update.ContentUpdate != null)
- {
- Console.Write(update.ContentUpdate);
- }
- }
-}
-
-// Call the async method
-await StreamCompletionsAsync();
-```
diff --git a/docs/includes/integrate-examples/javascript.md b/docs/includes/integrate-examples/javascript.md
deleted file mode 100644
index 9cd136b..0000000
--- a/docs/includes/integrate-examples/javascript.md
+++ /dev/null
@@ -1,134 +0,0 @@
-## Using the OpenAI Node.js SDK
-
-```javascript
-// Install with: npm install openai
-import OpenAI from "openai";
-// Note the port is dynamically assigned, so check the logs for the correct port.
-const openai = new OpenAI({
- baseURL: "http://localhost:5273/v1",
- apiKey: "not-needed-for-local",
-});
-
-async function generateText() {
- const response = await openai.chat.completions.create({
- model: "Phi-4-mini-instruct-cuda-gpu",
- messages: [
- {
- role: "user",
- content: "How can I integrate Foundry Local with my app?",
- },
- ],
- });
-
- console.log(response.choices[0].message.content);
-}
-
-generateText();
-```
-
-## Using Fetch API
-
-// Note the port is dynamically assigned, so check the logs for the correct port.
-
-```javascript
-async function queryModel() {
- const response = await fetch("http://localhost:5273/v1/chat/completions", {
- method: "POST",
- headers: {
- "Content-Type": "application/json",
- },
- body: JSON.stringify({
- model: "Phi-4-mini-instruct-cuda-gpu",
- messages: [
- { role: "user", content: "What are the advantages of Foundry Local?" },
- ],
- }),
- });
-
- const data = await response.json();
- console.log(data.choices[0].message.content);
-}
-
-queryModel();
-```
-
-## Streaming Responses
-
-### Using OpenAI SDK
-
-```javascript
-// Install with: npm install openai
-import OpenAI from "openai";
-
-const openai = new OpenAI({
- baseURL: "http://localhost:5273/v1",
- apiKey: "not-needed-for-local",
-});
-
-async function streamCompletion() {
- const stream = await openai.chat.completions.create({
- model: "Phi-4-mini-instruct-cuda-gpu",
- messages: [{ role: "user", content: "Write a short story about AI" }],
- stream: true,
- });
-
- for await (const chunk of stream) {
- if (chunk.choices[0]?.delta?.content) {
- process.stdout.write(chunk.choices[0].delta.content);
- }
- }
-}
-
-streamCompletion();
-```
-
-### Using Fetch API and ReadableStream
-
-```javascript
-async function streamWithFetch() {
- const response = await fetch("http://localhost:5273/v1/chat/completions", {
- method: "POST",
- headers: {
- "Content-Type": "application/json",
- Accept: "text/event-stream",
- },
- body: JSON.stringify({
- model: "Phi-4-mini-instruct-cuda-gpu",
- messages: [{ role: "user", content: "Write a short story about AI" }],
- stream: true,
- }),
- });
-
- const reader = response.body.getReader();
- const decoder = new TextDecoder();
-
- while (true) {
- const { done, value } = await reader.read();
- if (done) break;
-
- const chunk = decoder.decode(value);
- const lines = chunk.split("\n").filter((line) => line.trim() !== "");
-
- for (const line of lines) {
- if (line.startsWith("data: ")) {
- const data = line.substring(6);
- if (data === "[DONE]") continue;
-
- try {
- const json = JSON.parse(data);
- const content = json.choices[0]?.delta?.content || "";
- if (content) {
- // Print to console without line breaks, similar to process.stdout.write
- process.stdout.write(content);
- }
- } catch (e) {
- console.error("Error parsing JSON:", e);
- }
- }
- }
- }
-}
-
-// Call the function to start streaming
-streamWithFetch();
-```
diff --git a/docs/includes/integrate-examples/python.md b/docs/includes/integrate-examples/python.md
deleted file mode 100644
index 9e1bcf8..0000000
--- a/docs/includes/integrate-examples/python.md
+++ /dev/null
@@ -1,67 +0,0 @@
-## Using the OpenAI SDK
-
-```python
-# Install with: pip install openai
-import openai
-
-# Configure the client to use your local endpoint, noting the port is dynamically assigned
-client = openai.OpenAI(
- base_url="http://localhost:5273/v1",
- api_key="not-needed-for-local" # API key is not required for local usage
-)
-
-# Chat completions
-response = client.chat.completions.create(
- model="Phi-4-mini-instruct-cuda-gpu", # Use a model loaded in your service
- messages=[
- {"role": "user", "content": "Explain how Foundry Local works."}
- ]
-)
-
-print(response.choices[0].message.content)
-```
-
-## Using Direct HTTP Requests
-
-```python
-# Install with: pip install requests
-import requests
-import json
-# note the port is dynamically assigned, so check the logs for the correct port
-url = "http://localhost:5273/v1/chat/completions"
-
-payload = {
- "model": "Phi-4-mini-instruct-cuda-gpu",
- "messages": [
- {"role": "user", "content": "What are the benefits of running AI models locally?"}
- ]
-}
-
-headers = {
- "Content-Type": "application/json"
-}
-
-response = requests.post(url, headers=headers, data=json.dumps(payload))
-print(response.json()["choices"][0]["message"]["content"])
-```
-
-## Streaming Response
-
-```python
-import openai
-# note the port is dynamically assigned, so check the logs for the correct port
-client = openai.OpenAI(
- base_url="http://localhost:5273/v1",
- api_key="not-needed-for-local"
-)
-
-stream = client.chat.completions.create(
- model="Phi-4-mini-instruct-cuda-gpu",
- messages=[{"role": "user", "content": "Write a short story about AI"}],
- stream=True
-)
-
-for chunk in stream:
- if chunk.choices[0].delta.content is not None:
- print(chunk.choices[0].delta.content, end="")
-```
diff --git a/docs/includes/integrate-examples/rest.md b/docs/includes/integrate-examples/rest.md
deleted file mode 100644
index 7848102..0000000
--- a/docs/includes/integrate-examples/rest.md
+++ /dev/null
@@ -1,19 +0,0 @@
-## Basic Request
-
-For quick tests or integrations with command line scripts:
-
-```bash
-curl http://localhost:5273/v1/chat/completions ^
- -H "Content-Type: application/json" ^
- -d "{\"model\": \"Phi-4-mini-instruct-cuda-gpu\", \"messages\": [{\"role\": \"user\", \"content\": \"Tell me a short story\"}]}"
-```
-
-## Streaming Response
-
-**Note**: Please change the port to your dynamically assigned one. The example here works, but because there's no cleansing of the output, it may not be as clean as the other examples.
-
-```bash
-curl http://localhost:5273/v1/chat/completions ^
- -H "Content-Type: application/json" ^
- -d "{\"model\": \"Phi-4-mini-instruct-cuda-gpu\", \"messages\": [{\"role\": \"user\", \"content\": \"Tell me a short story\"}], \"stream\": true}"
-```
diff --git a/docs/media/architecture/foundry-local-arch.png b/docs/media/architecture/foundry-local-arch.png
deleted file mode 100644
index cf5066d..0000000
Binary files a/docs/media/architecture/foundry-local-arch.png and /dev/null differ
diff --git a/docs/reference/reference-cli.md b/docs/reference/reference-cli.md
deleted file mode 100644
index b489762..0000000
--- a/docs/reference/reference-cli.md
+++ /dev/null
@@ -1,112 +0,0 @@
-# Foundry Local CLI Reference
-
-This article provides a comprehensive reference for the Foundry Local command-line interface (CLI). The foundry CLI is structured into several categories to help you manage models, control the service, and maintain your local cache.
-
-## Overview
-
-To see all available commands, use the help option:
-
-```bash
-foundry --help
-```
-
-The foundry CLI is structured into these main categories:
-
-- **Model**: Commands related to managing and running models
-- **Service**: Commands for managing the Foundry Local service
-- **Cache**: Commands for managing the local cache where models are stored
-
-## Model commands
-
-The following table summarizes the commands related to managing and running models:
-
-| **Command** | **Description** |
-| -------------------------------- | -------------------------------------------------------------------------------- |
-| `foundry model --help` | Displays all available model-related commands and their usage. |
-| `foundry model run ` | Runs a specified model, downloading it if not cached, and starts an interaction. |
-| `foundry model list` | Lists all available models for local use. |
-| `foundry model info ` | Displays detailed information about a specific model. |
-| `foundry model download ` | Downloads a model to the local cache without running it. |
-| `foundry model load ` | Loads a model into the service. |
-| `foundry model unload ` | Unloads a model from the service. |
-
-## Service commands
-
-The following table summarizes the commands related to managing the Foundry Local service:
-
-| **Command** | **Description** |
-| ------------------------------- | ---------------------------------------------------------------- |
-| `foundry service --help` | Displays all available service-related commands and their usage. |
-| `foundry service start` | Starts the Foundry Local service. |
-| `foundry service stop` | Stops the Foundry Local service. |
-| `foundry service restart` | Restarts the Foundry Local service. |
-| `foundry service status` | Displays the current status of the Foundry Local service. |
-| `foundry service ps` | Lists all models currently loaded in the Foundry Local service. |
-| `foundry service logs` | Displays the logs of the Foundry Local service. |
-| `foundry service set ` | Set configuration of the Foundry Local service. |
-
-## Cache commands
-
-The following table summarizes the commands related to managing the local cache where models are stored:
-
-| **Command** | **Description** |
-| ------------------------------ | -------------------------------------------------------------- |
-| `foundry cache --help` | Displays all available cache-related commands and their usage. |
-| `foundry cache location` | Displays the current cache directory. |
-| `foundry cache list` | Lists all models stored in the local cache. |
-| `foundry cache remove ` | Deletes a model from the local cache. |
-| `foundry cache cd ` | Changes the cache directory. |
-
-## Common CLI usage examples
-
-### Quick start with a model
-
-```bash
-# Download and run a model interactively
-foundry model run phi-4-mini
-
-# Check model information before running
-foundry model info phi-4-mini
-
-# Download a model without running it
-foundry model download phi-4-mini
-```
-
-### Managing the service
-
-```bash
-# Check service status
-foundry service status
-
-# View active models
-foundry service ps
-
-# Restart the service when troubleshooting
-foundry service restart
-```
-
-### Working with the cache
-
-```bash
-# List cached models
-foundry cache list
-
-# Remove a model that's no longer needed
-foundry cache remove old-model
-
-# Change cache location to a larger drive
-foundry cache cd /path/to/larger/drive
-```
-
-### Advanced usage
-
-```bash
-# View detailed model license information
-foundry model info phi-4-mini --license
-
-# Generate diagnostic logs for support
-foundry zip-logs
-
-# Configure GPU settings for better performance
-foundry service set --gpu 0
-```
diff --git a/docs/reference/reference-rest.md b/docs/reference/reference-rest.md
deleted file mode 100644
index 2317de2..0000000
--- a/docs/reference/reference-rest.md
+++ /dev/null
@@ -1,519 +0,0 @@
-# Foundry Local REST API Reference
-
-> **⚠️ API UNDER DEVELOPMENT**
->
-> This API is actively being developed and may introduce breaking changes without prior notice. We recommend monitoring the changelog for updates before building production applications.
-
-## OpenAI v1 contract
-
-### POST /v1/chat/completions
-
-Handles chat completion requests.
-Compatible with the [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create)
-
-**Request Body:**
-
-_---Properties Defined by OpenAI Contract---_
-
-- `model` (string)
- The model to use for the completion.
-- `messages` (array)
- A list of messages comprising the conversation history.
- - Each message must contain:
- - `role` (string)
- The role of the author. Must be one of: `system`, `user`, or `assistant`.
- - `content` (string)
- The content of the message.
-- `temperature` (number, optional)
- Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random outputs, while lower values (e.g., 0.2) produce more focused, deterministic outputs.
-- `top_p` (number, optional)
- Nucleus sampling probability between 0 and 1. Value of 0.1 means only tokens comprising the top 10% probability mass are considered.
-- `n` (integer, optional)
- Number of chat completion choices to generate for each input message.
-- `stream` (boolean, optional)
- If true, partial message deltas will be sent as server-sent events as they become available, with the stream terminated by a `data: [DONE]` message.
-- `stop` (string or array, optional)
- Up to 4 sequences where the API will stop generating further tokens.
-- `max_tokens` (integer, optional)
- Maximum number of tokens to generate. Deprecated for o1 series models; use `max_completion_tokens` instead.
-- `max_completion_token` (integer, optional)
- Upper bound for the number of tokens to generate, including both visible output tokens and reasoning tokens.
-- `presence_penalty` (number, optional)
- Number between -2.0 and 2.0. Positive values penalize new tokens based on their presence in the text so far, increasing the model's likelihood to talk about new topics.
-- `frequency_penalty` (number, optional)
- Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text, decreasing the model's likelihood to repeat the same line verbatim.
-- `logit_bias` (map, optional)
- Modify the likelihood of specified tokens appearing in the completion.
-- `user` (string, optional)
- A unique identifier representing your end-user, which can help monitor and detect abuse.
-- `functions` (array, optional)
- A list of functions the model may generate JSON inputs for.
- - Each function must include:
- - `name` (string)
- Function name.
- - `description` (string)
- Function description.
- - `parameters` (object)
- Function parameters described as a JSON Schema object.
-- `function_call` (string or object, optional)
- Controls how the model responds to function calls.
- - If object, may include:
- - `name` (string, optional)
- The name of the function to call.
- - `arguments` (object, optional)
- The arguments to pass to the function.
-- `metadata` (object, optional)
- A dictionary of metadata key-value pairs.
-
-_---Additional Foundry Local Properties---_
-
-- `top_k` (number, optional)
- The number of highest probability vocabulary tokens to keep for top-k-filtering.
-- `random_seed` (integer, optional)
- Seed for reproducible random number generation.
-- `ep` (string, optional)
- Overwrite the provider for ONNX models. Supports: `"dml"`, `"cuda"`, `"qnn"`, `"cpu"`, `"webgpu"`.
-- `ttl` (integer, optional)
- Time to live in seconds for the model in memory.
-- `tools` (object, optional)
- Tools calculated for the request.
-
-**Response body:**
-
-- `id` (string)
- Unique identifier for the chat completion.
-- `object` (string)
- The object type, always `"chat.completion"`.
-- `created` (integer)
- Creation timestamp in epoch seconds.
-- `model` (string)
- The model used for completion.
-- `choices` (array)
- List of completion choices, each containing:
- - `index` (integer)
- The index of this choice.
- - `message` (object)
- The generated message with:
- - `role` (string)
- Always `"assistant"` for responses.
- - `content` (string)
- The actual generated text.
- - `finish_reason` (string)
- Why generation stopped (e.g., `"stop"`, `"length"`, `"function_call"`).
-- `usage` (object)
- Token usage statistics:
- - `prompt_tokens` (integer)
- Tokens in the prompt.
- - `completion_tokens` (integer)
- Tokens in the completion.
- - `total_tokens` (integer)
- Total tokens used.
-
-**Example:**
-
-- Request body
- ```json
- {
- "model": "phi-4-mini",
- "messages": [
- {
- "role": "user",
- "content": "Hello, how are you?"
- }
- ],
- "temperature": 0.7,
- "top_p": 1,
- "n": 1,
- "stream": false,
- "stop": null,
- "max_tokens": 100,
- "presence_penalty": 0,
- "frequency_penalty": 0,
- "logit_bias": {},
- "user": "user_id_123",
- "functions": [],
- "function_call": null,
- "metadata": {}
- }
- ```
-- Response body
- ```json
- {
- "id": "chatcmpl-1234567890",
- "object": "chat.completion",
- "created": 1677851234,
- "model": "phi-4-mini",
- "choices": [
- {
- "index": 0,
- "message": {
- "role": "assistant",
- "content": "I'm doing well, thank you! How can I assist you today?"
- },
- "finish_reason": "stop"
- }
- ],
- "usage": {
- "prompt_tokens": 10,
- "completion_tokens": 20,
- "total_tokens": 30
- }
- }
- ```
-
-### POST /v1/embeddings
-
-Handles embedding generation requests.
-Compatible with the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create)
-
-**Request Body:**
-
-- `model` (string)
- The embedding model to use (e.g., `"text-embedding-ada-002"`).
-- `input` (string or array)
- Input text to embed. Can be a single string or an array of strings/tokens.
-- `user` (string, optional)
- A unique identifier representing your end-user for abuse monitoring.
-
-**Response body:**
-
-- `object` (string)
- Always `"list"`.
-- `data` (array)
- List of embedding objects, each containing:
- - `object` (string)
- Always `"embedding"`.
- - `embedding` (array)
- The vector representation of the input text.
- - `index` (integer)
- The position of this embedding in the input array.
-- `model` (string)
- The model used for embedding generation.
-- `usage` (object)
- Token usage statistics:
- - `prompt_tokens` (integer)
- Number of tokens in the prompt.
- - `total_tokens` (integer)
- Total tokens used.
-
-**Example:**
-
-- Request body
- ```json
- {
- "model": "text-embedding-ada-002",
- "input": "Hello, how are you?",
- "user": "user_id_123"
- }
- ```
-- Response body
- ```json
- {
- "object": "list",
- "data": [
- {
- "object": "embedding",
- "embedding": [0.1, 0.2, 0.3, ...],
- "index": 0
- }
- ],
- "model": "text-embedding-ada-002",
- "usage": {
- "prompt_tokens": 10,
- "total_tokens": 10
- }
- }
- ```
-
-## Custom API
-
-### POST /openai/register
-
-Registers an external model provider for use with Foundry Local.
-
-**Request Body:**
-
-- `TypeName` (string)
- Provider name (e.g., `"deepseek"`)
-- `ModelName` (string)
- Model name to register (e.g., `"deepseek-chat"`)
-- `BaseUri` (string)
- The OpenAI-compatible base URI for the provider
-
-**Response:**
-
-- 200 OK
- Empty response body
-
-**Example:**
-
-- Request body
- ```json
- {
- "TypeName": "deepseek",
- "ModelName": "deepseek-chat",
- "BaseUri": "https://api.deepseek.com/v1"
- }
- ```
-
-### GET /openai/models
-
-Retrieves all available models, including both local models and registered external models.
-
-**Response:**
-
-- 200 OK
- An array of model names as strings.
-
-**Example:**
-
-- Response body
- ```json
- ["phi-4-mini", "mistral-7b-v0.2"]
- ```
-
-### GET /openai/load/{name}
-
-Loads a model into memory for faster inference.
-
-**URI Parameters:**
-
-- `name` (string)
- The model name to load.
-
-**Query Parameters:**
-
-- `unload` (boolean, optional)
- Whether to automatically unload the model after idle time. Defaults to `true`.
-- `ttl` (integer, optional)
- Time to live in seconds. If greater than 0, overrides `unload` parameter.
-- `ep` (string, optional)
- Execution provider to run this model. Supports: `"dml"`, `"cuda"`, `"qnn"`, `"cpu"`, `"webgpu"`.
- If not specified, uses settings from `genai_config.json`.
-
-**Response:**
-
-- 200 OK
- Empty response body
-
-**Example:**
-
-- Request URI
- ```
- GET /openai/load/phi-4-mini?ttl=3600&ep=dml
- ```
-
-### GET /openai/unload/{name}
-
-Unloads a model from memory.
-
-**URI Parameters:**
-
-- `name` (string)
- The model name to unload.
-
-**Query Parameters:**
-
-- `force` (boolean, optional)
- If `true`, ignores TTL settings and unloads immediately.
-
-**Response:**
-
-- 200 OK
- Empty response body
-
-**Example:**
-
-- Request URI
- ```
- GET /openai/unload/phi-4-mini?force=true
- ```
-
-### GET /openai/unloadall
-
-Unloads all models from memory.
-
-**Response:**
-
-- 200 OK
- Empty response body
-
-### GET /openai/loadedmodels
-
-Retrieves a list of currently loaded models.
-
-**Response:**
-
-- 200 OK
- An array of model names as strings.
-
-**Example:**
-
-- Response body
- ```json
- ["phi-4-mini", "mistral-7b-v0.2"]
- ```
-
-### GET /openai/getgpudevice
-
-Retrieves the currently selected GPU device ID.
-
-**Response:**
-
-- 200 OK
- An integer representing the current GPU device ID.
-
-### GET /openai/setgpudevice/{deviceId}
-
-Sets the active GPU device.
-
-**URI Parameters:**
-
-- `deviceId` (integer)
- The GPU device ID to use.
-
-**Response:**
-
-- 200 OK
- Empty response body
-
-**Example:**
-
-- Request URI
- ```
- GET /openai/setgpudevice/1
- ```
-
-### POST /openai/download
-
-Downloads a model to local storage.
-
-**Request Body:**
-
-- `model` (string)
- The model name to download.
-- `token` (string, optional)
- Authentication token for protected models (GitHub or Hugging Face).
-- `progressToken` (object, optional)
- For AITK only. Token to track download progress.
-- `customDirPath` (string, optional)
- Custom download directory (used for CLI, not needed for AITK).
-- `bufferSize` (integer, optional)
- HTTP download buffer size in KB. No effect on NIM or Azure Foundry models.
-- `ignorePipeReport` (boolean, optional)
- If `true`, forces progress reporting via HTTP stream instead of pipe.
- Defaults to `false` for AITK and `true` for Foundry Local.
-
-**Streaming Response:**
-
-During download, the server streams progress updates in the format:
-
-```
-("file name", percentage_complete)
-```
-
-**Final Response body:**
-
-- `Success` (boolean)
- Whether the download completed successfully.
-- `ErrorMessage` (string, optional)
- Error details if download failed.
-
-**Example:**
-
-- Request body
-
- ```json
- {
- "model": "phi-4-mini",
- "ignorePipeReport": true
- }
- ```
-
-- Response stream
-
- ```
- ("genai_config.json", 0.01)
- ("genai_config.json", 0.2)
- ("model.onnx.data", 0.5)
- ("model.onnx.data", 0.78)
- ...
- ("", 1)
- ```
-
-- Final response
- ```json
- {
- "Success": true,
- "ErrorMessage": null
- }
- ```
-
-### GET /openai/status
-
-Retrieves server status information.
-
-**Response body:**
-
-- `Endpoints` (array of strings)
- The HTTP server binding endpoints.
-- `ModelDirPath` (string)
- Directory where local models are stored.
-- `PipeName` (string)
- The current NamedPipe server name.
-
-**Example:**
-
-- Response body
- ```json
- {
- "Endpoints": ["http://localhost:5273"],
- "ModelDirPath": "/path/to/models",
- "PipeName": "inference_agent"
- }
- ```
-
-### POST /v1/chat/completions/tokenizer/encode/count
-
-Counts tokens for a given chat completion request without performing inference.
-
-**Request Body:**
-
-- Content-Type: application/json
-- JSON object in `ChatCompletionCreateRequest` format with:
- - `model` (string)
- Model to use for tokenization.
- - `messages` (array)
- Array of message objects with `role` and `content`.
-
-**Response Body:**
-
-- Content-Type: application/json
-- JSON object with token count:
- - `tokenCount` (integer)
- Number of tokens in the request.
-
-**Example:**
-
-- Request body
- ```json
- {
- "messages": [
- {
- "role": "system",
- "content": "This is a system message"
- },
- {
- "role": "user",
- "content": "Hello, what is Microsoft?"
- }
- ],
- "model": "Phi-4-mini-instruct-cuda-gpu"
- }
- ```
-- Response body
- ```json
- {
- "tokenCount": 23
- }
- ```
diff --git a/docs/reference/reference-sdk.md b/docs/reference/reference-sdk.md
deleted file mode 100644
index 56e70dc..0000000
--- a/docs/reference/reference-sdk.md
+++ /dev/null
@@ -1,379 +0,0 @@
-# Foundry Local Control Plane SDK Reference
-
-> **⚠️ SDK UNDER DEVELOPMENT**
->
-> This SDK is actively being developed and may introduce breaking changes without prior notice. We recommend monitoring the changelog for updates before building production applications.
-
-
-The Foundry Local Control Plane SDK simplifies AI model management in local environments by providing control-plane operations separate from data-plane inferencing code. This reference documents the SDK implementation for Python and JavaScript.
-
-## Python SDK Reference
-
-### Installation
-
-Install the Python package:
-
-```bash
-pip install foundry-local-sdk
-```
-
-### FoundryLocalManager Class
-
-The `FoundryLocalManager` class provides methods to manage models, cache, and the Foundry Local service.
-
-#### Initialization
-
-```python
-from foundry_local import FoundryLocalManager
-
-# Initialize and optionally bootstrap with a model
-manager = FoundryLocalManager(alias_or_model_id=None, bootstrap=True)
-```
-
-- `alias_or_model_id`: (optional) Alias or Model ID to download and load at startup.
-- `bootstrap`: (default True) If True, starts the service if not running and loads the model if provided.
-
-### A note on aliases
-
-Many methods outlined in this reference have an `alias_or_model_id` parameter in the signature. You can pass into the method either an **alias** or **model ID** as a value. Using an alias will:
-
-- Select the *best model* for the available hardware. For example, if a Nvidia CUDA GPU is available, Foundry Local selects the CUDA model. If a supported NPU is available, Foundry Local selects the NPU model.
-- Allow you to use a shorter name without needing to remember the model ID.
-
-> [!TIP]
-> We recommend passing into the `alias_or_model_id` parameter an **alias** because when you deploy your application, Foundry Local acquires the best model for the end user's machine at run-time.
-
-### Service Management
-
-| Method | Signature | Description |
-|-----------------------|---------------------------|--------------------------------------------------|
-| `is_service_running()`| `() -> bool` | Checks if the Foundry Local service is running. |
-| `start_service()` | `() -> None` | Starts the Foundry Local service. |
-| `service_uri` | `@property -> str` | Returns the service URI. |
-| `endpoint` | `@property -> str` | Returns the service endpoint. |
-| `api_key` | `@property -> str` | Returns the API key (from env or default). |
-
-### Catalog Management
-
-| Method | Signature | Description |
-|---------------------------|---------------------------------------------------|--------------------------------------------------|
-| `list_catalog_models()` | `() -> list[FoundryModelInfo]` | Lists all available models in the catalog. |
-| `refresh_catalog()` | `() -> None` | Refreshes the model catalog. |
-| `get_model_info()` | `(alias_or_model_id: str, raise_on_not_found=False) -> FoundryModelInfo or None` | Gets model info by alias or ID. |
-
-### Cache Management
-
-| Method | Signature | Description |
-|---------------------------|---------------------------------------------------|--------------------------------------------------|
-| `get_cache_location()` | `() -> str` | Returns the model cache directory path. |
-| `list_cached_models()` | `() -> list[FoundryModelInfo]` | Lists models downloaded to the local cache. |
-
-### Model Management
-
-| Method | Signature | Description |
-|-------------------------------|---------------------------------------------------------------------------|--------------------------------------------------|
-| `download_model()` | `(alias_or_model_id: str, token: str = None, force: bool = False) -> FoundryModelInfo` | Downloads a model to the local cache. |
-| `load_model()` | `(alias_or_model_id: str, ttl: int = 600) -> FoundryModelInfo` | Loads a model into the inference server. |
-| `unload_model()` | `(alias_or_model_id: str, force: bool = False) -> None` | Unloads a model from the inference server. |
-| `list_loaded_models()` | `() -> list[FoundryModelInfo]` | Lists all models currently loaded in the service.|
-
-## Example Usage
-
-The following code demonstrates how to use the `FoundryManager` class to manage models and interact with the Foundry Local service.
-
-```python
-from foundry_local import FoundryLocalManager
-
-# By using an alias, the most suitable model will be selected
-# to your end-user's device.
-alias = "phi-3.5-mini"
-
-# Create a FoundryLocalManager instance. This will start the Foundry.
-manager = FoundryLocalManager()
-
-# List available models in the catalog
-catalog = manager.list_catalog_models()
-print(f"Available models in the catalog: {catalog}")
-
-# Download and load a model
-model_info = manager.download_model(alias)
-model_info = manager.load_model(alias)
-print(f"Model info: {model_info}")
-
-# List models in cache
-local_models = manager.list_cached_models()
-print(f"Models in cache: {local_models}")
-
-# List loaded models
-loaded = manager.list_loaded_models()
-print(f"Models running in the service: {loaded}")
-
-# Unload a model
-manager.unload_model(alias)
-```
-
-### Integrate with OpenAI SDK
-
-Install the OpenAI package:
-
-```bash
-pip install openai
-```
-
-The following code demonstrates how to integrate the `FoundryLocalManager` with the OpenAI SDK to interact with a local model.
-
-```python
-import openai
-from foundry_local import FoundryLocalManager
-
-# By using an alias, the most suitable model will be downloaded
-# to your end-user's device.
-alias = "phi-3.5-mini"
-
-# Create a FoundryLocalManager instance. This will start the Foundry
-# Local service if it is not already running and load the specified model.
-manager = FoundryLocalManager(alias)
-
-# The remaining code us es the OpenAI Python SDK to interact with the local model.
-
-# Configure the client to use the local Foundry service
-client = openai.OpenAI(
- base_url=manager.endpoint,
- api_key=manager.api_key # API key is not required for local usage
-)
-
-# Set the model to use and generate a streaming response
-stream = client.chat.completions.create(
- model=manager.get_model_info(alias).id,
- messages=[{"role": "user", "content": "Why is the sky blue?"}],
- stream=True
-)
-
-# Print the streaming response
-for chunk in stream:
- if chunk.choices[0].delta.content is not None:
- print(chunk.choices[0].delta.content, end="", flush=True)
-```
-
-# JavaScript SDK Reference
-
-### Installation
-
-Install the package from npm:
-
-```bash
-npm install foundry-local-sdk
-```
-
-### FoundryLocalManager Class
-
-The `FoundryLocalManager` class lets you manage models, control the cache, and interact with the Foundry Local service in both browser and Node.js environments.
-
-#### Initialization
-
-```js
-import { FoundryLocalManager } from 'foundry-local-sdk'
-
-const foundryLocalManager = new FoundryLocalManager()
-```
-
-Available options:
-- `serviceUrl`: Base URL of the Foundry Local service
-- `fetch`: (optional) Custom fetch implementation for environments like Node.js
-
-### A note on aliases
-
-Many methods outlined in this reference have an `aliasOrModelId` parameter in the signature. You can pass into the method either an **alias** or **model ID** as a value. Using an alias will:
-
-- Select the *best model* for the available hardware. For example, if a Nvidia CUDA GPU is available, Foundry Local selects the CUDA model. If a supported NPU is available, Foundry Local selects the NPU model.
-- Allow you to use a shorter name without needing to remember the model ID.
-
-> [!TIP]
-> We recommend passing into the `aliasOrModelId` parameter an **alias** because when you deploy your application, Foundry Local acquires the best model for the end user's machine at run-time.
-
-### Service Management
-
-| Method | Signature | Description |
-|-----------------------|---------------------------|--------------------------------------------------|
-| `init()` | `(aliasOrModelId?: string) => Promise` | Initializes the SDK and optionally loads a model. |
-| `isServiceRunning()` | `() => Promise` | Checks if the Foundry Local service is running. |
-| `startService()` | `() => Promise` | Starts the Foundry Local service. |
-| `serviceUrl` | `string` | The base URL of the Foundry Local service. |
-| `endpoint` | `string` | The API endpoint (serviceUrl + `/v1`). |
-| `apiKey` | `string` | The API key (none). |
-
-
-### Catalog Management
-
-| Method | Signature | Description |
-|---------------------------|---------------------------------------------------------------------------|--------------------------------------------------|
-| `listCatalogModels()` | `() => Promise` | Lists all available models in the catalog. |
-| `refreshCatalog()` | `() => Promise` | Refreshes the model catalog. |
-| `getModelInfo()` | `(aliasOrModelId: string, throwOnNotFound = false) => Promise` | Gets model info by alias or ID. |
-
-
-### Cache Management
-
-| Method | Signature | Description |
-|---------------------------|---------------------------------------------------|--------------------------------------------------|
-| `getCacheLocation()` | `() => Promise` | Returns the model cache directory path. |
-| `listCachedModels()` | `() => Promise` | Lists models downloaded to the local cache. |
-
-
-### Model Management
-
-| Method | Signature | Description |
-|-------------------------------|---------------------------------------------------------------------------|--------------------------------------------------|
-| `downloadModel()` | `(aliasOrModelId: string, token?: string, force = false, onProgress?) => Promise` | Downloads a model to the local cache. |
-| `loadModel()` | `(aliasOrModelId: string, ttl = 600) => Promise` | Loads a model into the inference server. |
-| `unloadModel()` | `(aliasOrModelId: string, force = false) => Promise` | Unloads a model from the inference server. |
-| `listLoadedModels()` | `() => Promise` | Lists all models currently loaded in the service.|
-
-## Example Usage
-
-The following code demonstrates how to use the `FoundryLocalManager` class to manage models and interact with the Foundry Local service.
-
-```js
-import { FoundryLocalManager } from 'foundry-local-sdk'
-
-// By using an alias, the most suitable model will be downloaded
-// to your end-user's device.
-// TIP: You can find a list of available models by running the
-// following command in your terminal: `foundry model list`.
-const alias = 'phi-3.5-mini';
-
-const manager = new FoundryLocalManager()
-
-// Initialize the SDK and optionally load a model
-const modelInfo = await manager.init(alias)
-console.log('Model Info:', modelInfo)
-
-// Check if the service is running
-const isRunning = await manager.isServiceRunning()
-console.log(`Service running: ${isRunning}`)
-
-// List available models in the catalog
-const catalog = await manager.listCatalogModels()
-
-// Download and load a model
-await manager.downloadModel(alias)
-await manager.loadModel(alias)
-
-// List models in cache
-const localModels = await manager.listCachedModels()
-
-// List loaded models
-const loaded = await manager.listLoadedModels()
-
-// Unload a model
-await manager.unloadModel(alias)
-```
-
----
-
-## Integration with OpenAI Client
-
-Install the OpenAI package:
-
-```bash
-npm install openai
-```
-
-The following code demonstrates how to integrate the `FoundryLocalManager` with the OpenAI client to interact with a local model.
-
-```js
-import { OpenAI } from 'openai'
-import { FoundryLocalManager } from 'foundry-local-sdk'
-
-// By using an alias, the most suitable model will be downloaded
-// to your end-user's device.
-// TIP: You can find a list of available models by running the
-// following command in your terminal: `foundry model list`.
-const alias = 'phi-3.5-mini'
-
-// Create a FoundryLocalManager instance. This will start the Foundry
-// Local service if it is not already running.
-const foundryLocalManager = new FoundryLocalManager()
-
-// Initialize the manager with a model. This will download the model
-// if it is not already present on the user's device.
-const modelInfo = await foundryLocalManager.init(alias)
-console.log('Model Info:', modelInfo)
-
-const openai = new OpenAI({
- baseURL: foundryLocalManager.endpoint,
- apiKey: foundryLocalManager.apiKey,
-})
-
-async function streamCompletion() {
- const stream = await openai.chat.completions.create({
- model: modelInfo.id,
- messages: [{ role: 'user', content: 'What is the golden ratio?' }],
- stream: true,
-})
-
- for await (const chunk of stream) {
- if (chunk.choices[0]?.delta?.content) {
- process.stdout.write(chunk.choices[0].delta.content)
- }
- }
-}
-
-streamCompletion()
-```
-
-## Browser Usage
-
-The SDK includes a browser-compatible version where you must specify the service URL manually:
-
-```js
-import { FoundryLocalManager } from 'foundry-local-sdk/browser'
-
-// Specify the service URL
-// Run the Foundry Local service using the CLI: `foundry service start`
-// and use the URL from the CLI output
-const endpoint = 'ENDPOINT'
-
-const manager = new FoundryLocalManager({serviceUrl: endpoint})
-
-// Note: The `init`, `isServiceRunning`, and `startService` methods
-// are not available in the browser version
-```
-
-> [!NOTE]
-> The browser version doesn't support the `init`, `isServiceRunning`, and `startService` methods. You must ensure that the Foundry Local service is running before using the SDK in a browser environment. You can start the service using the Foundry Local CLI: `foundry service start`. You can glean the service URL from the CLI output.
-
-
-#### Example Usage
-
-```js
-import { FoundryLocalManager } from 'foundry-local-sdk/browser'
-
-// Specify the service URL
-// Run the Foundry Local service using the CLI: `foundry service start`
-// and use the URL from the CLI output
-const endpoint = 'ENDPOINT'
-
-const manager = new FoundryLocalManager({serviceUrl: endpoint})
-
-const alias = 'phi-3.5-mini'
-
-// Get all available models
-const catalog = await manager.listCatalogModels()
-console.log('Available models in catalog:', catalog)
-
-// Download and load a specific model
-await manager.downloadModel(alias)
-await manager.loadModel(alias)
-
-// View models in your local cache
-const localModels = await manager.listLocalModels()
-console.log('Cached models:', catalog)
-
-// Check which models are currently loaded
-const loaded = await manager.listLoadedModels()
-console.log('Loaded models in inference service:', loaded)
-
-// Unload a model when finished
-await manager.unloadModel(alias)
-```
diff --git a/docs/reference/reference-security-privacy.md b/docs/reference/reference-security-privacy.md
deleted file mode 100644
index 783893b..0000000
--- a/docs/reference/reference-security-privacy.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# Best practices and troubleshooting guide for Foundry Local
-
-This document provides best practices and troubleshooting tips for Foundry Local.
-
-## Security and privacy considerations
-
-Foundry Local is designed with privacy and security as core principles:
-
-- **Local processing**: All data processed by Foundry Local remains on your device and is never sent to Microsoft or any external services.
-- **Privacy**: see [Microsoft Privacy Statement](https://aka.ms/privacy).
-- **Air-gapped environments**: Foundry Local can be used in disconnected environments after initial model download.
-
-## Security best practices
-
-- Use Foundry Local in environments that comply with your organization's security policies.
-- When handling sensitive data, ensure your device meets your organization's security requirements.
-- Use disk encryption on devices where cached models might contain sensitive fine-tuning data.
-
-## Licensing considerations
-
-When using Foundry Local, be aware of the licensing implications for the models you run. You can view full terms of model license for each model in the model catalog using:
-
-```bash
-foundry model info --license
-```
-
-Models available through Foundry Local are subject to their original licenses:
-
-- Open-source models maintain their original licenses (e.g., Apache 2.0, MIT).
-- Commercial models may have specific usage restrictions or require separate licensing.
-- Always review the licensing information for each model before deploying in production.
-
-## Production deployment scope
-
-Foundry Local is designed for on-device inference and _not_ distributed, containerized, or multi-machine production deployments.
-
-## Troubleshooting
-
-### Common issues and solutions
-
-| Issue | Possible Cause | Solution |
-| -------------------------- | ----------------------------------------- | ----------------------------------------------------------------------------------- |
-| Slow inference | CPU-only model with large parameter count | Use GPU-optimized model variants when available |
-| Model download failures | Network connectivity issues | Check your internet connection and run `foundry cache list` to verify cache status |
-| The service fails to start | Port conflicts or permission issues | Try `foundry service restart` or report an issue with logs using `foundry zip-logs` |
-
-### Improving performance
-
-If you experience slow inference, consider the following strategies:
-
-- Use GPU acceleration when available
-- Identify bottlenecks by monitoring memory usage during inference
-- Try more quantized model variants (like INT8 instead of FP16)
-- Adjust batch sizes for non-interactive workloads
diff --git a/docs/reference/reference-troubleshooting.md b/docs/reference/reference-troubleshooting.md
deleted file mode 100644
index 7107dde..0000000
--- a/docs/reference/reference-troubleshooting.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# Troubleshooting
-
-## Common issues and solutions
-
-| Issue | Possible Cause | Solution |
-| ----------------------- | --------------------------------------- | ----------------------------------------------------------------------------------------- |
-| Slow inference | CPU-only model on large parameter count | Use GPU-optimized model variants when available |
-| Model download failures | Network connectivity issues | Check your internet connection, try `foundry cache list` to verify cache state |
-| Service won't start | Port conflicts or permission issues | Try `foundry service restart` or post an issue providing logs with `foundry zip-logsrock` |
-| Qualcomm NPU error (`Qnn error code 5005: "Failed to load from EpContext model. qnn_backend_manager."`) | Qualcomm NPU error | Under investigation |
-| `winget install Microsoft.FoundryLocal --scope machine` fails with “The current system configuration does not support the installation of this package.” | Winget blocks MSIX machine-scope installs due to an OS bug when using provisioning APIs from a packaged context | Use `Add-AppxProvisionedPackage` instead. Download the `.msix` and its dependency, then run in **elevated** PowerShell: `Add-AppxProvisionedPackage -Online -PackagePath .\FoundryLocal.msix -DependencyPackagePath .\VcLibs.appx -SkipLicense`. This installs Foundry Local for all users.|
-| QNN graph execute error (Error 6031) | NPU model issue | Under investigation. Try using a different model or the equivalent cpu model in the meantime. |
-## Diagnosing performance issues
-
-If you're experiencing slow inference:
-
-1. Check that you're using GPU acceleration if available
-2. Monitor memory usage during inference to detect bottlenecks
-3. Consider a more quantized model variant (e.g., INT8 instead of FP16)
-4. Experiment with batch sizes for non-interactive workloads
diff --git a/docs/tutorials/chat-application-with-open-web-ui.md b/docs/tutorials/chat-application-with-open-web-ui.md
deleted file mode 100644
index 4de52b9..0000000
--- a/docs/tutorials/chat-application-with-open-web-ui.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# Build a chat application with Open Web UI
-
-This tutorial shows you how to create a chat application using Foundry Local and Open Web UI. When you finish, you'll have a working chat interface running entirely on your local device.
-
-## Prerequisites
-
-Before you start this tutorial, you need:
-
-- **Foundry Local** [installed](../get-started.md) on your computer.
-- **At least one model loaded** with the `foundry model load` command, like this:
- ```bash
- foundry model load Phi-4-mini-instruct-cuda-gpu
- ```
-
-## Set up Open Web UI for chat
-
-1. **Install Open Web UI** by following the instructions from the [Open Web UI GitHub repository](https://github.com/open-webui/open-webui).
-
-2. **Launch Open Web UI** with this command in your terminal:
-
- ```bash
- open-webui serve
- ```
-
-3. Open your web browser and go to [http://localhost:8080](http://localhost:8080).
-
-4. Enable Direct Connections:
- 1. Select **Settings** and **Admin Settings** in the profile menu.
- 2. Select **Connections** in the navigation menu.
- 3. Enable **Direct Connections** by turning on the toggle. This allows users to connect to their own OpenAI compatible API endpoints.
-
-6. **Connect Open Web UI to Foundry Local**:
-
- 1. Select **Settings** in the navigation menu
- 2. Select **Connections**
- 3. Select **Manage Direct Connections**
- 4. Select the **+** icon to add a connection
- 5. Enter `http://localhost:PORT/v1` for the URL, where `PORT` is the port number assigned to your Foundry Local instance.
- 6. Type any value (like `test`) for the API Key, since it cannot be empty
- 7. Save your connection
-
-
-
-5. **Start chatting with your model**:
- 1. Your loaded models will appear in the dropdown at the top
- 2. Select any model from the list
- 3. Type your message in the input box at the bottom
-
-That's it! You're now chatting with an AI model running entirely on your local device.
-
-## Next steps
-
-- [Build an application with LangChain](use-langchain-with-foundry-local.md)
-- [How to compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md)
diff --git a/docs/tutorials/use-langchain-with-foundry-local.md b/docs/tutorials/use-langchain-with-foundry-local.md
deleted file mode 100644
index 9529f55..0000000
--- a/docs/tutorials/use-langchain-with-foundry-local.md
+++ /dev/null
@@ -1,74 +0,0 @@
-# Build an application with LangChain
-
-This tutorial shows you how to create an application using Foundry Local and LangChain. You learn how to integrate locally hosted AI models with the popular LangChain framework.
-
-## Prerequisites
-
-Before starting this tutorial, you need:
-
-- **Foundry Local** [installed](../get-started.md) on your computer
-- **At least one model loaded** using the `Foundry Local SDK`:
- ```bash
- pip install foundry-local-sdk
- ```
- ```python
- from foundry_local import FoundryLocalManager
- manager = FoundryLocalManager(model_id_or_alias=None, bootstrap=True)
- manager.download_model("Phi-4-mini-instruct-generic-cpu")
- manager.load_model("Phi-4-mini-instruct-generic-cpu")
- ```
-- **LangChain with OpenAI support** installed:
-
- ```bash
- pip install langchain[openai]
- ```
-
-## Create a LangChain application
-
-Foundry Local supports the OpenAI Chat Completion API, making it easy to integrate with LangChain. Here's how to build a translation application:
-
-```python
-import os
-
-from langchain_openai import ChatOpenAI
-from langchain_core.prompts import ChatPromptTemplate
-
-# Set a placeholder API key (not actually used by Foundry Local)
-if not os.environ.get("OPENAI_API_KEY"):
- os.environ["OPENAI_API_KEY"] = "no_key"
-
-# Configure ChatOpenAI to use your locally-running model, noting the port is dynamically assigned
-llm = ChatOpenAI(
- model="Phi-4-mini-instruct-generic-cpu",
- base_url="http://localhost:5273/v1/",
- temperature=0.0,
- streaming=False
-)
-
-# Create a translation prompt template
-prompt = ChatPromptTemplate.from_messages([
- (
- "system",
- "You are a helpful assistant that translates {input_language} to {output_language}."
- ),
- ("human", "{input}")
-])
-
-# Build a simple chain by connecting the prompt to the language model
-chain = prompt | llm
-
-# Run the chain with your inputs
-ai_msg = chain.invoke({
- "input_language": "English",
- "output_language": "French",
- "input": "I love programming."
-})
-
-# Display the result
-print(ai_msg)
-```
-
-## Next steps
-
-- Explore the [LangChain documentation](https://python.langchain.com/docs/introduction) for more advanced features and capabilities.
-- [How to compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md)
diff --git a/docs/what-is-foundry-local.md b/docs/what-is-foundry-local.md
deleted file mode 100644
index 2f93dc9..0000000
--- a/docs/what-is-foundry-local.md
+++ /dev/null
@@ -1,36 +0,0 @@
-# What is Foundry Local?
-
-Foundry Local is a local version of Azure AI Foundry that enables local execution of large language models (LLMs) directly on your device. This on-device AI inference solution provides privacy, customization, and cost benefits compared to cloud-based alternatives. Best of all, it fits into your existing workflows and applications with an easy-to-use CLI and REST API!
-
-Foundry Local applies the optimization work of ONNX Runtime, Olive, and the ONNX ecosystem, Foundry Local delivers a highly optimized and performant user experience for running AI models locally.
-
-## Key features
-
-- **On-Device Inference**: Run LLMs locally on your own hardware, reducing dependency on cloud services while keeping your data on-device.
-- **Model Customization**: Choose from preset models or bring your own to match your specific requirements and use cases.
-- **Cost Efficiency**: Avoid recurring cloud service costs by using your existing hardware, making AI tasks more accessible.
-- **Seamless Integration**: Easily interface with your applications via an endpoint or test with the CLI, with the option to scale to Azure AI Foundry as your workload demands increase.
-
-## Use cases
-
-Foundry Local is ideal for scenarios where:
-
-- Data privacy and security are paramount
-- You need to operate in environments with limited or no internet connectivity
-- You want to reduce cloud inference costs
-- You need low-latency AI responses for real-time applications
-- You want to experiment with AI models before deploying to a cloud environment
-
-## Pricing and billing
-
-Entirely Free! You're using your own hardware, and there are no extra costs associated with running AI models locally.
-
-## How to get access
-
-Download from the Microsoft Store. (WIP)
-
-## Next steps
-
-- [Get started with Foundry Local](./get-started.md)
-- [Compile Hugging Face models for Foundry Local](./how-to/compile-models-for-foundry-local.md)
-- [Learn more about ONNX Runtime](https://onnxruntime.ai/docs/)
diff --git a/media/icons/foundry_local_black.svg b/media/icons/foundry_local_black.svg
new file mode 100644
index 0000000..96c34ce
--- /dev/null
+++ b/media/icons/foundry_local_black.svg
@@ -0,0 +1,13 @@
+
diff --git a/media/icons/foundry_local_color.svg b/media/icons/foundry_local_color.svg
new file mode 100644
index 0000000..412a6fb
--- /dev/null
+++ b/media/icons/foundry_local_color.svg
@@ -0,0 +1,40 @@
+
diff --git a/media/icons/foundry_local_white.svg b/media/icons/foundry_local_white.svg
new file mode 100644
index 0000000..e4a8b13
--- /dev/null
+++ b/media/icons/foundry_local_white.svg
@@ -0,0 +1,13 @@
+
diff --git a/samples/cs/GettingStarted/Directory.Packages.props b/samples/cs/GettingStarted/Directory.Packages.props
new file mode 100644
index 0000000..0598093
--- /dev/null
+++ b/samples/cs/GettingStarted/Directory.Packages.props
@@ -0,0 +1,15 @@
+
+
+ true
+ 0.11.0
+ 1.23.2
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/ExcludeExtraLibs.props b/samples/cs/GettingStarted/ExcludeExtraLibs.props
new file mode 100644
index 0000000..036f5e7
--- /dev/null
+++ b/samples/cs/GettingStarted/ExcludeExtraLibs.props
@@ -0,0 +1,49 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/README.md b/samples/cs/GettingStarted/README.md
new file mode 100644
index 0000000..f47cb60
--- /dev/null
+++ b/samples/cs/GettingStarted/README.md
@@ -0,0 +1,59 @@
+# 🚀 Getting started with the Foundry Local C# SDK
+
+There are two NuGet packages for the Foundry Local SDK - a WinML and a cross-platform package - that have *exactly* the same API surface but are optimised for different platforms:
+
+- **Windows**: Uses the `Microsoft.AI.Foundry.Local.WinML` package that is specific to Windows applications. The WinML package uses Windows Machine Learning to deliver optimal performance and user experience on Windows devices.
+- **Cross-Platform**: Use the `Microsoft.AI.Foundry.Local` package that can be used for cross-platform applications (Windows, Linux, macOS).
+
+> [!TIP]
+> Whilst you can use either package on Windows, we recommend using the WinML package for Windows applications to take advantage of the Windows ML framework for optimal performance and user experience. Your end users will benefit with:
+> - a wider range of hardware acceleration options that are automatically managed by Windows ML.
+> - a smaller application package size because downloading hardware-specific libraries occurs at application runtime rather than bundled with your application.
+
+Both the WinML and cross-platform packages provide the same APIs, so you can easily switch between the two packages if you need to target multiple platforms. The samples include the following projects:
+
+- **HelloFoundryLocalSdk**: A simple console application that initializes the Foundry Local SDK, downloads a model, loads it and does chat completions.
+- **FoundryLocalWebServer**: A simple console application that shows how to set up a local OpenAI-compliant web server using the Foundry Local SDK.
+- **AudioTranscriptionExample**: A simple console application that demonstrates how to use the Foundry Local SDK for audio transcription tasks.
+- **ModelManagementExample**: A simple console application that demonstrates how to manage models - such as variant selection and updates - using the Foundry Local SDK.
+
+## Running the samples
+
+1. Clone the Foundry Local repository from GitHub.
+ ```bash
+ git clone https://github.com/microsoft/Foundry-Local.git
+ ```
+2. Open and run the samples.
+
+ **Windows:**
+ 1. Open the `Foundry-Local/samples/cs/GettingStarted/windows/FoundrySamplesWinML.sln` solution in Visual Studio or your preferred IDE.
+ 1. If you're using Visual Studio, run any of the sample projects (e.g., `HelloFoundryLocalSdk`) by selecting the project in the Solution Explorer and selecting the **Start** button (or pressing **F5**).
+
+ Alternatively, you can run the projects using the .NET CLI. For x64 (update the `` as needed):
+ ```bash
+ cd Foundry-Local/samples/cs/GettingStarted/windows
+ dotnet run --project /.csproj -r:win-x64
+ ```
+ or for ARM64:
+ ```bash
+ ```bash
+ cd Foundry-Local/samples/cs/GettingStarted/windows
+ dotnet run --project /.csproj -r:win-arm64
+ ```
+
+
+ **macOS or Linux:**
+ 1. Open the `Foundry-Local/samples/cs/GettingStarted/cross-platform/FoundrySamplesXPlatform.sln` solution in Visual Studio Code or your preferred IDE.
+ 1. Run the project using the .NET CLI (update the `` and `` as needed):
+ ```bash
+ cd Foundry-Local/samples/cs/GettingStarted/cross-platform
+ dotnet run --project /.csproj -r:
+ ```
+ For example, to run the `HelloFoundryLocalSdk` project on macOS (Apple Silicon), use the following command:
+
+ ```bash
+ cd Foundry-Local/samples/cs/GettingStarted/cross-platform
+ dotnet run --project HelloFoundryLocalSdk/HelloFoundryLocalSdk.csproj -r:osx-arm64
+ ```
+
+
diff --git a/samples/cs/GettingStarted/cross-platform/AudioTranscriptionExample/AudioTranscriptionExample.csproj b/samples/cs/GettingStarted/cross-platform/AudioTranscriptionExample/AudioTranscriptionExample.csproj
new file mode 100644
index 0000000..8f837ce
--- /dev/null
+++ b/samples/cs/GettingStarted/cross-platform/AudioTranscriptionExample/AudioTranscriptionExample.csproj
@@ -0,0 +1,31 @@
+
+
+
+ Exe
+ net9.0
+ enable
+ enable
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ PreserveNewest
+
+
+
+
+
+
+
diff --git a/samples/cs/GettingStarted/cross-platform/FoundryLocalWebServer/FoundryLocalWebServer.csproj b/samples/cs/GettingStarted/cross-platform/FoundryLocalWebServer/FoundryLocalWebServer.csproj
new file mode 100644
index 0000000..45f6d5c
--- /dev/null
+++ b/samples/cs/GettingStarted/cross-platform/FoundryLocalWebServer/FoundryLocalWebServer.csproj
@@ -0,0 +1,25 @@
+
+
+
+ Exe
+ net9.0
+ enable
+ enable
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/samples/cs/GettingStarted/cross-platform/FoundrySamplesXPlatform.sln b/samples/cs/GettingStarted/cross-platform/FoundrySamplesXPlatform.sln
new file mode 100644
index 0000000..eddab62
--- /dev/null
+++ b/samples/cs/GettingStarted/cross-platform/FoundrySamplesXPlatform.sln
@@ -0,0 +1,50 @@
+
+Microsoft Visual Studio Solution File, Format Version 12.00
+# Visual Studio Version 17
+VisualStudioVersion = 17.14.36705.20
+MinimumVisualStudioVersion = 10.0.40219.1
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "HelloFoundryLocalSdk", "HelloFoundryLocalSdk\HelloFoundryLocalSdk.csproj", "{785AAE8A-8CD6-4916-B858-29B8A7EF8FF2}"
+EndProject
+Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "build", "build", "{8EC462FD-D22E-90A8-E5CE-7E832BA40C5D}"
+ ProjectSection(SolutionItems) = preProject
+ ..\Directory.Packages.props = ..\Directory.Packages.props
+ ..\ExcludeExtraLibs.props = ..\ExcludeExtraLibs.props
+ ..\nuget.config = ..\nuget.config
+ EndProjectSection
+EndProject
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "FoundryLocalWebServer", "FoundryLocalWebServer\FoundryLocalWebServer.csproj", "{D1D6C453-3088-4D8D-B320-24D718601C26}"
+EndProject
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "AudioTranscriptionExample", "AudioTranscriptionExample\AudioTranscriptionExample.csproj", "{2FAD8210-8AEB-4063-9C61-57B7AD26772D}"
+EndProject
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ModelManagementExample", "ModelManagementExample\ModelManagementExample.csproj", "{AAD0233C-9FDD-46A7-9428-2F72BC76D38E}"
+EndProject
+Global
+ GlobalSection(SolutionConfigurationPlatforms) = preSolution
+ Debug|Any CPU = Debug|Any CPU
+ Release|Any CPU = Release|Any CPU
+ EndGlobalSection
+ GlobalSection(ProjectConfigurationPlatforms) = postSolution
+ {785AAE8A-8CD6-4916-B858-29B8A7EF8FF2}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
+ {785AAE8A-8CD6-4916-B858-29B8A7EF8FF2}.Debug|Any CPU.Build.0 = Debug|Any CPU
+ {785AAE8A-8CD6-4916-B858-29B8A7EF8FF2}.Release|Any CPU.ActiveCfg = Release|Any CPU
+ {785AAE8A-8CD6-4916-B858-29B8A7EF8FF2}.Release|Any CPU.Build.0 = Release|Any CPU
+ {D1D6C453-3088-4D8D-B320-24D718601C26}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
+ {D1D6C453-3088-4D8D-B320-24D718601C26}.Debug|Any CPU.Build.0 = Debug|Any CPU
+ {D1D6C453-3088-4D8D-B320-24D718601C26}.Release|Any CPU.ActiveCfg = Release|Any CPU
+ {D1D6C453-3088-4D8D-B320-24D718601C26}.Release|Any CPU.Build.0 = Release|Any CPU
+ {2FAD8210-8AEB-4063-9C61-57B7AD26772D}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
+ {2FAD8210-8AEB-4063-9C61-57B7AD26772D}.Debug|Any CPU.Build.0 = Debug|Any CPU
+ {2FAD8210-8AEB-4063-9C61-57B7AD26772D}.Release|Any CPU.ActiveCfg = Release|Any CPU
+ {2FAD8210-8AEB-4063-9C61-57B7AD26772D}.Release|Any CPU.Build.0 = Release|Any CPU
+ {AAD0233C-9FDD-46A7-9428-2F72BC76D38E}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
+ {AAD0233C-9FDD-46A7-9428-2F72BC76D38E}.Debug|Any CPU.Build.0 = Debug|Any CPU
+ {AAD0233C-9FDD-46A7-9428-2F72BC76D38E}.Release|Any CPU.ActiveCfg = Release|Any CPU
+ {AAD0233C-9FDD-46A7-9428-2F72BC76D38E}.Release|Any CPU.Build.0 = Release|Any CPU
+ EndGlobalSection
+ GlobalSection(SolutionProperties) = preSolution
+ HideSolutionNode = FALSE
+ EndGlobalSection
+ GlobalSection(ExtensibilityGlobals) = postSolution
+ SolutionGuid = {9FC1F302-B28C-4CAB-8ABA-24FA9EBBED6F}
+ EndGlobalSection
+EndGlobal
diff --git a/samples/cs/GettingStarted/cross-platform/HelloFoundryLocalSdk/HelloFoundryLocalSdk.csproj b/samples/cs/GettingStarted/cross-platform/HelloFoundryLocalSdk/HelloFoundryLocalSdk.csproj
new file mode 100644
index 0000000..7bb0ef9
--- /dev/null
+++ b/samples/cs/GettingStarted/cross-platform/HelloFoundryLocalSdk/HelloFoundryLocalSdk.csproj
@@ -0,0 +1,23 @@
+
+
+
+ Exe
+ net9.0
+ enable
+ enable
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/samples/cs/GettingStarted/cross-platform/ModelManagementExample/ModelManagementExample.csproj b/samples/cs/GettingStarted/cross-platform/ModelManagementExample/ModelManagementExample.csproj
new file mode 100644
index 0000000..bca8d51
--- /dev/null
+++ b/samples/cs/GettingStarted/cross-platform/ModelManagementExample/ModelManagementExample.csproj
@@ -0,0 +1,24 @@
+
+
+
+ Exe
+ net9.0
+ enable
+ enable
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/samples/cs/GettingStarted/nuget.config b/samples/cs/GettingStarted/nuget.config
new file mode 100644
index 0000000..c9dd917
--- /dev/null
+++ b/samples/cs/GettingStarted/nuget.config
@@ -0,0 +1,16 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/src/AudioTranscriptionExample/Program.cs b/samples/cs/GettingStarted/src/AudioTranscriptionExample/Program.cs
new file mode 100644
index 0000000..34eecf9
--- /dev/null
+++ b/samples/cs/GettingStarted/src/AudioTranscriptionExample/Program.cs
@@ -0,0 +1,66 @@
+using Microsoft.AI.Foundry.Local;
+
+var config = new Configuration
+{
+ AppName = "foundry_local_samples",
+ LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
+};
+
+
+// Initialize the singleton instance.
+await FoundryLocalManager.CreateAsync(config, Utils.GetAppLogger());
+var mgr = FoundryLocalManager.Instance;
+
+
+// Ensure that any Execution Provider (EP) downloads run and are completed.
+// EP packages include dependencies and may be large.
+// Download is only required again if a new version of the EP is released.
+// For cross platform builds there is no dynamic EP download and this will return immediately.
+await Utils.RunWithSpinner("Registering execution providers", mgr.EnsureEpsDownloadedAsync());
+
+
+// Get the model catalog
+var catalog = await mgr.GetCatalogAsync();
+
+
+// Get a model using an alias and select the CPU model variant
+var model = await catalog.GetModelAsync("whisper-tiny") ?? throw new System.Exception("Model not found");
+var modelVariant = model.Variants.First(v => v.Info.Runtime?.DeviceType == DeviceType.CPU);
+model.SelectVariant(modelVariant);
+
+
+// Download the model (the method skips download if already cached)
+await model.DownloadAsync(progress =>
+{
+ Console.Write($"\rDownloading model: {progress:F2}%");
+ if (progress >= 100f)
+ {
+ Console.WriteLine();
+ }
+});
+
+
+// Load the model
+Console.Write($"Loading model {model.Id}...");
+await model.LoadAsync();
+Console.WriteLine("done.");
+
+
+// Get a chat client
+var audioClient = await model.GetAudioClientAsync();
+
+
+// Get a transcription with streaming outputs
+Console.WriteLine("Transcribing audio with streaming output:");
+var response = audioClient.TranscribeAudioStreamingAsync("Recording.mp3", CancellationToken.None);
+await foreach (var chunk in response)
+{
+ Console.Write(chunk.Text);
+ Console.Out.Flush();
+}
+
+Console.WriteLine();
+
+
+// Tidy up - unload the model
+await model.UnloadAsync();
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/src/AudioTranscriptionExample/Recording.mp3 b/samples/cs/GettingStarted/src/AudioTranscriptionExample/Recording.mp3
new file mode 100644
index 0000000..deb3841
Binary files /dev/null and b/samples/cs/GettingStarted/src/AudioTranscriptionExample/Recording.mp3 differ
diff --git a/samples/cs/GettingStarted/src/FoundryLocalWebServer/Program.cs b/samples/cs/GettingStarted/src/FoundryLocalWebServer/Program.cs
new file mode 100644
index 0000000..2be8296
--- /dev/null
+++ b/samples/cs/GettingStarted/src/FoundryLocalWebServer/Program.cs
@@ -0,0 +1,82 @@
+using Microsoft.AI.Foundry.Local;
+using OpenAI;
+using System.ClientModel;
+
+var config = new Configuration
+{
+ AppName = "foundry_local_samples",
+ LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information,
+ Web = new Configuration.WebService
+ {
+ Urls = "http://127.0.0.1:55588"
+ }
+};
+
+
+// Initialize the singleton instance.
+await FoundryLocalManager.CreateAsync(config, Utils.GetAppLogger());
+var mgr = FoundryLocalManager.Instance;
+
+
+// Ensure that any Execution Provider (EP) downloads run and are completed.
+// EP packages include dependencies and may be large.
+// Download is only required again if a new version of the EP is released.
+// For cross platform builds there is no dynamic EP download and this will return immediately.
+await Utils.RunWithSpinner("Registering execution providers", mgr.EnsureEpsDownloadedAsync());
+
+
+// Get the model catalog
+var catalog = await mgr.GetCatalogAsync();
+
+
+// Get a model using an alias
+var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");
+// Download the model (the method skips download if already cached)
+await model.DownloadAsync(progress =>
+{
+ Console.Write($"\rDownloading model: {progress:F2}%");
+ if (progress >= 100f)
+ {
+ Console.WriteLine();
+ }
+});
+
+
+// Load the model
+Console.Write($"Loading model {model.Id}...");
+await model.LoadAsync();
+Console.WriteLine("done.");
+
+
+// Start the web service
+Console.Write($"Starting web service on {config.Web.Urls}...");
+await mgr.StartWebServiceAsync();
+Console.WriteLine("done.");
+
+// <<<<<< OPEN AI SDK USAGE >>>>>>
+// Use the OpenAI SDK to call the local Foundry web service
+
+ApiKeyCredential key = new ApiKeyCredential("notneeded");
+OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions
+{
+ Endpoint = new Uri(config.Web.Urls + "/v1"),
+});
+
+var chatClient = client.GetChatClient(model.Id);
+var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue?");
+
+Console.Write($"[ASSISTANT]: ");
+foreach (var completionUpdate in completionUpdates)
+{
+ if (completionUpdate.ContentUpdate.Count > 0)
+ {
+ Console.Write(completionUpdate.ContentUpdate[0].Text);
+ }
+}
+Console.WriteLine();
+// <<<<<< END OPEN AI SDK USAGE >>>>>>
+
+// Tidy up
+// Stop the web service and unload model
+await mgr.StopWebServiceAsync();
+await model.UnloadAsync();
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/src/HelloFoundryLocalSdk/Program.cs b/samples/cs/GettingStarted/src/HelloFoundryLocalSdk/Program.cs
new file mode 100644
index 0000000..52efe41
--- /dev/null
+++ b/samples/cs/GettingStarted/src/HelloFoundryLocalSdk/Program.cs
@@ -0,0 +1,67 @@
+using Microsoft.AI.Foundry.Local;
+using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
+
+CancellationToken ct = new CancellationToken();
+
+var config = new Configuration
+{
+ AppName = "foundry_local_samples",
+ LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
+};
+
+
+// Initialize the singleton instance.
+await FoundryLocalManager.CreateAsync(config, Utils.GetAppLogger());
+var mgr = FoundryLocalManager.Instance;
+
+
+// Ensure that any Execution Provider (EP) downloads run and are completed.
+// EP packages include dependencies and may be large.
+// Download is only required again if a new version of the EP is released.
+// For cross platform builds there is no dynamic EP download and this will return immediately.
+await Utils.RunWithSpinner("Registering execution providers", mgr.EnsureEpsDownloadedAsync());
+
+
+// Get the model catalog
+var catalog = await mgr.GetCatalogAsync();
+
+
+// Get a model using an alias.
+var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");
+
+// Download the model (the method skips download if already cached)
+await model.DownloadAsync(progress =>
+{
+ Console.Write($"\rDownloading model: {progress:F2}%");
+ if (progress >= 100f)
+ {
+ Console.WriteLine();
+ }
+});
+
+// Load the model
+Console.Write($"Loading model {model.Id}...");
+await model.LoadAsync();
+Console.WriteLine("done.");
+
+// Get a chat client
+var chatClient = await model.GetChatClientAsync();
+
+// Create a chat message
+List messages = new()
+{
+ new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
+};
+
+// Get a streaming chat completion response
+Console.WriteLine("Chat completion response:");
+var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
+await foreach (var chunk in streamingResponse)
+{
+ Console.Write(chunk.Choices[0].Message.Content);
+ Console.Out.Flush();
+}
+Console.WriteLine();
+
+// Tidy up - unload the model
+await model.UnloadAsync();
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/src/ModelManagementExample/Program.cs b/samples/cs/GettingStarted/src/ModelManagementExample/Program.cs
new file mode 100644
index 0000000..bfeb6b1
--- /dev/null
+++ b/samples/cs/GettingStarted/src/ModelManagementExample/Program.cs
@@ -0,0 +1,152 @@
+using Microsoft.AI.Foundry.Local;
+using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
+using System.Diagnostics;
+
+CancellationToken ct = new CancellationToken();
+
+var config = new Configuration
+{
+ AppName = "foundry_local_samples",
+ LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
+};
+
+
+// Initialize the singleton instance.
+await FoundryLocalManager.CreateAsync(config, Utils.GetAppLogger());
+var mgr = FoundryLocalManager.Instance;
+
+
+// Ensure that any Execution Provider (EP) downloads run and are completed.
+// EP packages include dependencies and may be large.
+// Download is only required again if a new version of the EP is released.
+// For cross platform builds there is no dynamic EP download and this will return immediately.
+await Utils.RunWithSpinner("Registering execution providers", mgr.EnsureEpsDownloadedAsync());
+
+
+// Model catalog operations
+// In this section of the code we demonstrate the various model catalog operations
+// Get the model catalog object
+var catalog = await mgr.GetCatalogAsync();
+
+// List available models
+Console.WriteLine("Available models for your hardware:");
+var models = await catalog.ListModelsAsync();
+foreach (var availableModel in models)
+{
+ foreach (var variant in availableModel.Variants)
+ {
+ Console.WriteLine($" - Alias: {variant.Alias} (Id: {string.Join(", ", variant.Id)})");
+ }
+}
+
+// List cached models (i.e. downloaded models) from the catalog
+var cachedModels = await catalog.GetCachedModelsAsync();
+Console.WriteLine("\nCached models:");
+foreach (var cachedModel in cachedModels)
+{
+ Console.WriteLine($"- {cachedModel.Alias} ({cachedModel.Id})");
+}
+
+
+// Get a model using an alias from the catalog
+var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");
+
+// `model.SelectedVariant` indicates which variant will be used by default.
+//
+// Models in Model.Variants are ordered by priority, with the highest priority first.
+// The first downloaded model is selected by default.
+// The highest priority is selected if no models have been downloaded.
+// If the selected variant is not the highest priority, it means that Foundry Local
+// has found a locally cached variant for you to improve performance (remove need to download).
+Console.WriteLine("\nThe default selected model variant is: " + model.Id);
+if (model.SelectedVariant != model.Variants.First())
+{
+ Debug.Assert(await model.SelectedVariant.IsCachedAsync());
+ Console.WriteLine("The model variant was selected due to being locally cached.");
+}
+
+
+// OPTIONAL: `model` can be used directly and `model.SelectedVariant` will be used as the default.
+// You can explicitly select or use a specific ModelVariant if you want more control
+// over the device and/or execution provider used.
+// Model and ModelVariant can be used interchangeably in methods such as
+// DownloadAsync, LoadAsync, UnloadAsync and GetChatClientAsync.
+//
+// Choices:
+// - Use a ModelVariant directly from the catalog if you know the variant Id
+// - `var modelVariant = await catalog.GetModelVariantAsync("qwen2.5-0.5b-instruct-generic-gpu:3")`
+//
+// - Get the ModelVariant from Model.Variants
+// - `var modelVariant = model.Variants.First(v => v.Id == "qwen2.5-0.5b-instruct-generic-cpu:4")`
+// - `var modelVariant = model.Variants.First(v => v.Info.Runtime?.DeviceType == DeviceType.GPU)`
+// - optional: update selected variant in `model` using `model.SelectVariant(modelVariant);` if you wish to use
+// `model` in your code.
+
+// For this example we explicitly select the CPU variant, and call SelectVariant so all the following example code
+// uses the `model` instance.
+Console.WriteLine("Selecting CPU variant of model");
+var modelVariant = model.Variants.First(v => v.Info.Runtime?.DeviceType == DeviceType.CPU);
+model.SelectVariant(modelVariant);
+
+
+// Download the model (the method skips download if already cached)
+await model.DownloadAsync(progress =>
+{
+ Console.Write($"\rDownloading model: {progress:F2}%");
+ if (progress >= 100f)
+ {
+ Console.WriteLine();
+ }
+});
+
+// Load the model
+await model.LoadAsync();
+
+
+// List loaded models (i.e. in memory) from the catalog
+var loadedModels = await catalog.GetLoadedModelsAsync();
+Console.WriteLine("\nLoaded models:");
+foreach (var loadedModel in loadedModels)
+{
+ Console.WriteLine($"- {loadedModel.Alias} ({loadedModel.Id})");
+}
+Console.WriteLine();
+
+
+// Get a chat client
+var chatClient = await model.GetChatClientAsync();
+
+// Create a chat message
+List messages = new()
+{
+ new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
+};
+
+// You can adjust settings on the chat client
+chatClient.Settings.Temperature = 0.7f;
+chatClient.Settings.N = 512;
+
+Console.WriteLine("Chat completion response:");
+var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
+await foreach (var chunk in streamingResponse)
+{
+ Console.Write(chunk.Choices[0].Message.Content);
+ Console.Out.Flush();
+}
+Console.WriteLine();
+Console.WriteLine();
+
+// Tidy up - unload the model
+Console.WriteLine($"Unloading model {model.Id}...");
+await model.UnloadAsync();
+Console.WriteLine("Model unloaded.");
+
+// Show loaded models from the catalog after unload
+loadedModels = await catalog.GetLoadedModelsAsync();
+Console.WriteLine("\nLoaded models after unload (will be empty):");
+foreach (var loadedModel in loadedModels)
+{
+ Console.WriteLine($"- {loadedModel.Alias} ({loadedModel.Id})");
+}
+Console.WriteLine();
+Console.WriteLine("Sample complete.");
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/src/Shared/Utils.cs b/samples/cs/GettingStarted/src/Shared/Utils.cs
new file mode 100644
index 0000000..b9c0fcf
--- /dev/null
+++ b/samples/cs/GettingStarted/src/Shared/Utils.cs
@@ -0,0 +1,54 @@
+using Microsoft.Extensions.Logging;
+using System.Text;
+
+internal static class Utils
+{
+ private static readonly ILoggerFactory _loggerFactory;
+
+ static Utils()
+ {
+ _loggerFactory = Microsoft.Extensions.Logging.LoggerFactory.Create(builder =>
+ {
+ builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
+ });
+ }
+
+ ///
+ /// Get a dummy application logger.
+ ///
+ /// ILogger
+ internal static ILogger GetAppLogger()
+ {
+ return _loggerFactory.CreateLogger("FoundryLocalSamples");
+ }
+
+ internal static async Task RunWithSpinner(string msg, T workTask) where T : Task
+ {
+ // Start the spinner
+ using var cts = new CancellationTokenSource();
+ var spinnerTask = ShowSpinner(msg, cts.Token);
+
+ await workTask; // wait for the real work to finish
+ cts.Cancel(); // stop the spinner
+ await spinnerTask; // wait for spinner to exit
+ }
+
+ private static async Task ShowSpinner(string msg, CancellationToken token)
+ {
+ Console.OutputEncoding = Encoding.UTF8;
+
+ var sequence = new[] { '◴','◷','◶','◵' };
+
+ int counter = 0;
+
+ while (!token.IsCancellationRequested)
+ {
+ Console.Write($"{msg}\t{sequence[counter % sequence.Length]}");
+ Console.SetCursorPosition(0, Console.CursorTop);
+ counter++;
+ await Task.Delay(200, token).ContinueWith(_ => { });
+ }
+
+ Console.WriteLine($"\nDone.\n");
+ }
+}
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/windows/AudioTranscriptionExample/AudioTranscriptionExample.csproj b/samples/cs/GettingStarted/windows/AudioTranscriptionExample/AudioTranscriptionExample.csproj
new file mode 100644
index 0000000..b40f74a
--- /dev/null
+++ b/samples/cs/GettingStarted/windows/AudioTranscriptionExample/AudioTranscriptionExample.csproj
@@ -0,0 +1,33 @@
+
+
+
+ Exe
+ enable
+ enable
+
+ net9.0-windows10.0.26100
+ true
+ ARM64;x64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ PreserveNewest
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/windows/FoundryLocalWebServer/FoundryLocalWebServer.csproj b/samples/cs/GettingStarted/windows/FoundryLocalWebServer/FoundryLocalWebServer.csproj
new file mode 100644
index 0000000..94fd583
--- /dev/null
+++ b/samples/cs/GettingStarted/windows/FoundryLocalWebServer/FoundryLocalWebServer.csproj
@@ -0,0 +1,27 @@
+
+
+
+ Exe
+ enable
+ enable
+
+ net9.0-windows10.0.26100
+ true
+ x64;ARM64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/windows/FoundrySamplesWinML.sln b/samples/cs/GettingStarted/windows/FoundrySamplesWinML.sln
new file mode 100644
index 0000000..176c3a4
--- /dev/null
+++ b/samples/cs/GettingStarted/windows/FoundrySamplesWinML.sln
@@ -0,0 +1,68 @@
+
+Microsoft Visual Studio Solution File, Format Version 12.00
+# Visual Studio Version 17
+VisualStudioVersion = 17.14.36705.20
+MinimumVisualStudioVersion = 10.0.40219.1
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "HelloFoundryLocalSdk", "HelloFoundryLocalSdk\HelloFoundryLocalSdk.csproj", "{72ABF21E-2BFD-412A-9039-A594B392F00C}"
+EndProject
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "FoundryLocalWebServer", "FoundryLocalWebServer\FoundryLocalWebServer.csproj", "{77026F3A-25E0-40AB-B941-2A6252E13A35}"
+EndProject
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "AudioTranscriptionExample", "AudioTranscriptionExample\AudioTranscriptionExample.csproj", "{80F60523-40E1-4743-A256-974B21A9C6AB}"
+EndProject
+Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "build", "build", "{8EC462FD-D22E-90A8-E5CE-7E832BA40C5D}"
+ ProjectSection(SolutionItems) = preProject
+ ..\Directory.Packages.props = ..\Directory.Packages.props
+ ..\ExcludeExtraLibs.props = ..\ExcludeExtraLibs.props
+ ..\nuget.config = ..\nuget.config
+ EndProjectSection
+EndProject
+Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ModelManagementExample", "ModelManagementExample\ModelManagementExample.csproj", "{6BBA4217-6798-4629-AF27-6526FCC5FA5B}"
+EndProject
+Global
+ GlobalSection(SolutionConfigurationPlatforms) = preSolution
+ Debug|ARM64 = Debug|ARM64
+ Debug|x64 = Debug|x64
+ Release|ARM64 = Release|ARM64
+ Release|x64 = Release|x64
+ EndGlobalSection
+ GlobalSection(ProjectConfigurationPlatforms) = postSolution
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Debug|ARM64.ActiveCfg = Debug|ARM64
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Debug|ARM64.Build.0 = Debug|ARM64
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Debug|x64.ActiveCfg = Debug|x64
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Debug|x64.Build.0 = Debug|x64
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Release|ARM64.ActiveCfg = Release|ARM64
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Release|ARM64.Build.0 = Release|ARM64
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Release|x64.ActiveCfg = Release|x64
+ {72ABF21E-2BFD-412A-9039-A594B392F00C}.Release|x64.Build.0 = Release|x64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Debug|ARM64.ActiveCfg = Debug|ARM64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Debug|ARM64.Build.0 = Debug|ARM64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Debug|x64.ActiveCfg = Debug|x64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Debug|x64.Build.0 = Debug|x64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Release|ARM64.ActiveCfg = Release|ARM64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Release|ARM64.Build.0 = Release|ARM64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Release|x64.ActiveCfg = Release|x64
+ {77026F3A-25E0-40AB-B941-2A6252E13A35}.Release|x64.Build.0 = Release|x64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Debug|ARM64.ActiveCfg = Debug|ARM64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Debug|ARM64.Build.0 = Debug|ARM64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Debug|x64.ActiveCfg = Debug|x64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Debug|x64.Build.0 = Debug|x64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Release|ARM64.ActiveCfg = Release|ARM64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Release|ARM64.Build.0 = Release|ARM64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Release|x64.ActiveCfg = Release|x64
+ {80F60523-40E1-4743-A256-974B21A9C6AB}.Release|x64.Build.0 = Release|x64
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Debug|ARM64.ActiveCfg = Debug|Any CPU
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Debug|ARM64.Build.0 = Debug|Any CPU
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Debug|x64.ActiveCfg = Debug|x64
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Debug|x64.Build.0 = Debug|x64
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Release|ARM64.ActiveCfg = Release|Any CPU
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Release|ARM64.Build.0 = Release|Any CPU
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Release|x64.ActiveCfg = Release|Any CPU
+ {6BBA4217-6798-4629-AF27-6526FCC5FA5B}.Release|x64.Build.0 = Release|Any CPU
+ EndGlobalSection
+ GlobalSection(SolutionProperties) = preSolution
+ HideSolutionNode = FALSE
+ EndGlobalSection
+ GlobalSection(ExtensibilityGlobals) = postSolution
+ SolutionGuid = {17462B72-2BD9-446A-8E57-E313251686D9}
+ EndGlobalSection
+EndGlobal
diff --git a/samples/cs/GettingStarted/windows/HelloFoundryLocalSdk/HelloFoundryLocalSdk.csproj b/samples/cs/GettingStarted/windows/HelloFoundryLocalSdk/HelloFoundryLocalSdk.csproj
new file mode 100644
index 0000000..52b0ea0
--- /dev/null
+++ b/samples/cs/GettingStarted/windows/HelloFoundryLocalSdk/HelloFoundryLocalSdk.csproj
@@ -0,0 +1,27 @@
+
+
+
+ Exe
+ enable
+ enable
+
+ net9.0-windows10.0.26100
+ true
+ ARM64;x64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/samples/cs/GettingStarted/windows/ModelManagementExample/ModelManagementExample.csproj b/samples/cs/GettingStarted/windows/ModelManagementExample/ModelManagementExample.csproj
new file mode 100644
index 0000000..e336023
--- /dev/null
+++ b/samples/cs/GettingStarted/windows/ModelManagementExample/ModelManagementExample.csproj
@@ -0,0 +1,26 @@
+
+
+
+ Exe
+ enable
+ enable
+
+ net9.0-windows10.0.26100
+ true
+ ARM64;x64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/samples/dotNET/rag/foundry-local-architecture.md b/samples/dotNET/rag/foundry-local-architecture.md
deleted file mode 100644
index 6b04f79..0000000
--- a/samples/dotNET/rag/foundry-local-architecture.md
+++ /dev/null
@@ -1,116 +0,0 @@
-# Foundry Local Architecture
-
-Foundry Local is designed to enable efficient, secure, and scalable AI model inference directly on local devices. This article explains the key components of the Foundry Local architecture and how they interact to deliver AI capabilities.
-
-The benefits of Foundry Local include:
-
-- **Low Latency**: By running models locally, Foundry Local minimizes the time it takes to process requests and return results.
-- **Data Privacy**: Sensitive data can be processed locally without sending it to the cloud, ensuring compliance with data protection regulations.
-- **Flexibility**: Foundry Local supports a wide range of hardware configurations, allowing users to choose the best setup for their needs.
-- **Scalability**: Foundry Local can be deployed on various devices, from personal computers to powerful servers, making it suitable for different use cases.
-- **Cost-Effectiveness**: Running models locally can reduce costs associated with cloud computing, especially for high-volume applications.
-- **Offline Capabilities**: Foundry Local can operate without an internet connection, making it ideal for remote or disconnected environments.
-- **Integration with Existing Workflows**: Foundry Local can be easily integrated into existing development and deployment workflows, allowing for a smooth transition to local inference.
-
-## Key Components
-
-The key components of the Foundry Local architecture are articulated in the following diagram:
-
-
-
-### Foundry Local Service
-
-The Foundry Local Service is an OpenAI compatible REST server that provides a standardized interface for interacting with the inference engine and model management. Developers can use this API to send requests, run models, and retrieve results programmatically.
-
-- **Endpoint**: `http://localhost:PORT/v1`
- - Note: The port is dynamically assigned, so check the logs for the correct port.
-- **Use Cases**:
- - Integrating Foundry Local with custom applications.
- - Running models via HTTP requests.
-
-### ONNX Runtime
-
-The ONNX runtime is a core component responsible for running AI models. It uses optimized ONNX models to perform inference efficiently on local hardware, such as CPUs, GPUs, or NPUs.
-
-**Features**:
-
-- Supports multiple hardware providers (for example: NVIDIA, AMD, Intel) and devices (for example: NPUs, CPUs, GPUs).
-- Provides a unified interface for running models on different hardware platforms.
-- Best-in-class performance.
-- Supports quantized models for faster inference.
-
-### Model Management
-
-Foundry Local provides robust tools for managing AI models, ensuring that they're readily available for inference and easy to maintain. Model management is handled through the **Model Cache** and the **Command-Line Interface (CLI)**.
-
-#### Model Cache
-
-The model cache is a local storage system where AI models are downloaded and stored. It ensures that models are available for inference without requiring repeated downloads. The cache can be managed using the Foundry CLI or REST API.
-
-- **Purpose**: Reduces latency by storing models locally.
-- **Management Commands**:
- - `foundry cache list`: Lists all models stored in the local cache.
- - `foundry cache remove `: Deletes a specific model from the cache.
- - `foundry cache cd `: Changes the directory where models are stored.
-
-#### Model Lifecycle
-
-1. **Download**: Models are downloaded from the Azure AI Foundry model catalog to local disk.
-2. **Load**: Models are loaded into the Foundry Local service (and therefore memory) for inference. You can set a TTL (time-to-live) for how long the model should remain in memory (the default is 10 minutes).
-3. **Run**: Models are inferenced.
-4. **Unload**: Models can be unloaded from the inference engine to free up resources.
-5. **Delete**: Models can be deleted from the local cache to free up disk space.
-
-#### Model Compilation using Olive
-
-Before models can be used with Foundry Local, they must be compiled and optimized in the [ONNX](https://onnx.ai) format. Microsoft provides a selection of published models in the Azure AI Foundry Model Catalog that are already optimized for Foundry Local. However, you aren't limited to those models - by using [Olive](https://microsoft.github.io/Olive/). Olive is a powerful framework for preparing AI models for efficient inference. It converts models into the ONNX format, optimizes their graph structure, and applies techniques like quantization to improve performance on local hardware.
-
-**💡 TIP**: To learn more about compiling models for Foundry Local, read [Compile Hugging Face models for Foundry Local](../how-to/compile-models-for-foundry-local.md).
-
-### Hardware Abstraction Layer
-
-The hardware abstraction layer ensures that Foundry Local can run on various devices by abstracting the underlying hardware. To optimize performance based on the available hardware, Foundry Local supports:
-
-- **multiple _execution providers_**, such as NVIDIA CUDA, AMD, Qualcomm, Intel.
-- **multiple _device types_**, such as CPU, GPU, NPU.
-
-### Developer Experiences
-
-The Foundry Local architecture is designed to provide a seamless developer experience, enabling easy integration and interaction with AI models.
-
-Developers can choose from various interfaces to interact with the system, including:
-
-#### Command-Line Interface (CLI)
-
-The Foundry CLI is a powerful tool for managing models, the inference engine, and the local cache.
-
-**Examples**:
-
-- `foundry model list`: Lists all available models in the local cache.
-- `foundry model run `: Runs a model.
-- `foundry service status`: Checks the status of the service.
-
-**💡 TIP**: To learn more about the CLI commands, read [Foundry Local CLI Reference](../reference/reference-cli.md).
-
-#### Inferencing SDK Integration
-
-Foundry Local supports integration with various SDKs, such as the OpenAI SDK, enabling developers to use familiar programming interfaces to interact with the local inference engine.
-
-- **Supported SDKs**: Python, JavaScript, C#, and more.
-
-**💡 TIP**: To learn more about integrating with inferencing SDKs, read [Integrate Foundry Local with Inferencing SDKs](../how-to/integrate-with-inference-sdks.md).
-
-#### AI Toolkit for Visual Studio Code
-
-The AI Toolkit for Visual Studio Code provides a user-friendly interface for developers to interact with Foundry Local. It allows users to run models, manage the local cache, and visualize results directly within the IDE.
-
-- **Features**:
- - Model management: Download, load, and run models from within the IDE.
- - Interactive console: Send requests and view responses in real-time.
- - Visualization tools: Graphical representation of model performance and results.
-
-## Next Steps
-
-- [Get started with Foundry Local](../get-started.md)
-- [Integrate with Inference SDKs](../how-to/integrate-with-inference-sdks.md)
-- [Foundry Local CLI Reference](../reference/reference-cli.md)
diff --git a/samples/dotNET/rag/README.md b/samples/rag/README.md
similarity index 100%
rename from samples/dotNET/rag/README.md
rename to samples/rag/README.md
diff --git a/docs/concepts/foundry-local-architecture.md b/samples/rag/foundry-local-architecture.md
similarity index 100%
rename from docs/concepts/foundry-local-architecture.md
rename to samples/rag/foundry-local-architecture.md
diff --git a/samples/dotNET/rag/rag_foundrylocal_demo.ipynb b/samples/rag/rag_foundrylocal_demo.ipynb
similarity index 100%
rename from samples/dotNET/rag/rag_foundrylocal_demo.ipynb
rename to samples/rag/rag_foundrylocal_demo.ipynb