VPTQ
Vector Post-Training Quantization (VPTQ) is a novel Post-Training Quantization method that leverages Vector Quantization to high accuracy on LLMs at an extremely low bit-width (
Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Vector Post-Training Quantization (VPTQ) is a novel Post-Training Quantization method that leverages Vector Quantization to high accuracy on LLMs at an extremely low bit-width (
EASIER is a domain specific language embedded in PyTorch to automatically scale physical simulations up and out. It just-in-time (JIT) distributes tensor dataflows that describe physical simulations to any number of workers and compiles them…
Developed by Microsoft Research, BitNet b1.58 2B4T is the first open-source, native 1-bit large language model (LLM) in which every parameter is ternary (i.e., -1, 0, 1), at a 2-billion parameter scale. Trained on a…
LLM2CLIP is a novel approach that embraces the power of LLMs to unlock CLIP’s potential. By fine-tuning the LLM in the caption space with contrastive learning, we extract its textual capabilities into the output embeddings,…
Aurora is a machine learning model that can predict atmospheric variables, such as temperature. It is a foundation model, which means that it was first generally trained on a lot of data and then can…
vAttention is a memory manager for KV-cache in LLM serving systems. It decouples the allocation of virtual memory and physical memory using the CUDA virtual memory APIs. This approach enables allocating physical memory on demand…
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automate…
RepoClassBench (RCB): is a repository-level code-generation benchmark. Retrieve-RepoTools-Reflect (RRR) is a framework for code generation using Language Models (LLMs) with static-analysis tools in an agent setup.
Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback (like numerical rewards or losses, natural language text, compiler errors, etc.). Trace generalizes the back-propagation algorithm by capturing and propagating an…