8 Research Papers Microsoft is Presenting at NeurIPS 2023

One of the most important AI conferences, NeurIPS, has kickstarted its 37th edition. The gathering is so important in the AI/ML community because it provides a good bellwether on the state-of-the-art and where the field may be heading.

As usual, most companies and institutions in the domain are presenting their research at the conference in New Orleans. Out of the lot, here are eight papers presented by Microsoft at the event this year:

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

Diffusion models are great for generating images, so the researchers at Microsoft have extended their use to generate text. However, text has a sequential structure, unlike images. So, they introduced Auto-Regressive Diffusion (AR-Diffusion). It ensures that generating words on the right depends on those on the left.

In experiments, AR-Diffusion outperformed other models in tasks like text summarisation and machine translation and did so 100 to 600 times faster for similar results.

The code is available on GitHub.

Dissecting In-Context Learning of Translations in GPTs

Recent work on machine translation using large language models has focused chiefly on selecting a small set of examples. This study by Microsoft explores the importance of showing translations in context by modifying high-quality examples.

The researchers found that changes in the target language significantly impact translation quality, emphasising the importance of the output text distribution in learning. They also introduce a method, Zero-Shot-Context, which improves zero-shot translation performance in models like GPT-3, making it competitive with prompted translations.

TextDiffuser: Diffusion Models as Text Painters

TextDiffuser improves text generation in images by using two steps: a model designs text layout, and diffusion models create images. The researchers introduce the MARIO-10M dataset for evaluation, containing 10 million annotated image-text pairs. TextDiffuser is shown to be flexible, creating high-quality text images with prompts or templates and filling in missing text in incomplete images.

A Theory of Unsupervised Translation Motivated by Understanding Animal Communication

In this research, Microsoft researchers introduce a framework to analyse Unsupervised Machine Translation (UMT) when there’s no parallel data and the source and target languages are unrelated. The framework relies on a prior probability distribution that assigns probabilities to potential translations. The researchers apply this framework to two language models, finding that translation accuracy depends on the complexity of the source language and the commonalities between the source and target languages.

They also establish limits on the source language data needed for unsupervised translation, showing surprisingly similar requirements to supervised translation. This suggests that, for specific language models, the amount of data required in unsupervised translation is comparable to supervised translation.

(S)GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

In this paper, the researchers investigate how randomness and step sizes in gradient descent (GD) and stochastic gradient descent (SGD) affect the regularisation of diagonal linear networks.

The study shows that larger step sizes consistently benefit SGD in sparse regression problems but can hinder the recovery of sparse solutions for GD. These effects are most pronounced when step sizes are in a specific range just before instability, termed the “edge of stability” regime.

Adversarial Model for Offline Reinforcement Learning

Here, the researchers introduce ARMOR, a novel offline Reinforcement Learning framework. ARMOR robustly improves policies relative to a reference, even with incomplete data coverage. It adversarially trains a model, remaining competitive within available data and resilient to hyperparameter choices. ARMOR outperforms existing methods without using model ensembles in practical tests and consistently enhances reference policies across various hyperparameter settings.

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors

Over time, deployed models decay due to shifting inputs, changing user needs, or emergent knowledge gaps. When harmful behaviours are identified, targeted edits are required. However, current model editors, which adjust specific behaviours of pre-trained models, degrade model performance over multiple edits.

Researchers at Microsoft have proposed GRACE, a Lifelong Model Editing method, which uses spot-fixes on streaming errors of a deployed model, to leave minimal impact on unrelated inputs. GRACE writes new mappings into a pre-trained model’s latent space, creating a discrete, local codebook of edits without altering model weights. The researchers stated that this is the first method that enables thousands of sequential edits using only streaming errors.

The code is available on GitHub.

A Unified Model and Dimension for Interactive Estimation

In this research, scientists explore a concept called interactive estimation, a framework for learning where the goal is to estimate a target based on its similarity to points queried by the learner.

They introduce a measure called dissimilarity dimension, which helps understand how easy it is to learn in this framework. Additionally, they explain how the dissimilarity dimension relates to well-known parameters in both frameworks, offering improved analyses in some cases.

The post 8 Research Papers Microsoft is Presenting at NeurIPS 2023 appeared first on Analytics India Magazine.

Previous News

Alex Jones back on X: Who is he, what led to his comeback

Next News

The Circle FC concludes the 2nd cohort of K-Startup Centre India

8 Research Papers Microsoft is Presenting at NeurIPS 2023

Disclaimer

Popular

More Like this

8 Research Papers Microsoft is Presenting at NeurIPS 2023

Disclaimer

More like this

Popular

Block title

Upcoming Events

Newsletter Signup Form!