Events/Blog

view

Rethinking Academic Integrity in the AI Era: A Comparative Analysis of ChatGPT and University Students in 32 Courses

By Aneesh Tickoo -September 7, 2023

Artificial intelligence (AI) that generates new content using machine learning algorithms to build on previously created text, audio, or visual information is known as generative AI. Many people now view this sector as a “game-changer that society and industry need to be ready for” due to recent breakthroughs in the area and its previously unheard-of accessibility. For instance, Stable Diffusion and DALL-E have drawn much attention in the art world for their ability to produce works in various genres. Another generative AI technology, Amper Music, has previously been utilized to construct whole albums and generate music songs in any genre.

The most recent tool in this area is ChatGPT, which can produce textual replies that resemble human responses to various cues in several languages. To be more precise, it does so in a conversational way, allowing users to organically expand on earlier cues in the form of a continuous dialogue. For its almost unlimited value in multiple out-of-the-box applications, including creative writing, marketing, customer service, and journalism, to name a few, this tool has been dubbed an “extraordinary hit” and a “revolution in productivity.” With ChatGPT hitting one million users in only five days after its debut and surging to over 100 million monthly users in just two months, the tool’s capabilities have aroused much attention.

Despite its amazing capabilities, ethical issues have dogged generative AI. There has been continuous discussion over who owns the vast amounts of data that are available online and are used to train generative AI models. Additionally, as these tools develop, it becomes more difficult to distinguish between human and algorithmic creations. Education-related debates over academic integrity infractions by high school and university students have been prompted by ChatGPT’s capacity to produce essay writing and assignment solutions. For instance, educational districts in New York City, Los Angeles, and Baltimore have prohibited its usage in the United States.

Similarly, Australian colleges have stated that they want to resume “pen and paper” exams to discourage students from using technology to write essays. Since many instructors are worried about plagiarism, academics, including George Washington University, Rutgers University, and Appalachian State University, have decided to phase out take-home, open-book assignments completely. The use of ChatGPT to produce academic writing has also been prohibited by several conferences and publications, which is not unexpected considering that abstracts created by ChatGPT have been demonstrated to be identical to human-generated material.

>However, several people have defended and even advocated ChatGPT to enhance writing production. In education, previous research has looked at the effectiveness and utility of big language models in various fields, including medical and healthcare, computer and data science, law, business, journalism and media, and language acquisition. Even though these studies found mixed results when comparing ChatGPT’s performance on standardized tests to that of students, studies that specifically compared the model’s performance to that of prior large language models all found that the task of question-answering had significantly improved.

Researchers in the past in their evaluation of ChatGPT’s performance on the US Medical Licencing Exam found that ChatGPT performed at or near the passing level on each of the test’s three phases without the need for extra specialized training or reinforcement. Similarly, others tested the ChatGPT model on the US Fundamentals of Engineering exam to assess its performance in the context of engineering. They demonstrated in their study how the model’s performance fluctuated depending on the exam’s many sections, scoring highly in some, like Professional Practise and Ethics, while scoring poorly in others, like Hydrology.

Despite these instances, a systematic investigation contrasting ChatGPT performance with that of students from different academic areas at the same university still needs to be improved in the literature. Additionally, it needs to be clarified where students and instructors stand on using this technology globally. Finally, it is uncertain if ChatGPT-generated assignment solutions are detectable. Here, researchers from New York University Abu Dhabi compare ChatGPT’s performance to that of students in 32 university-level courses from eight different fields to analyze its potential as a tool for plagiarism. They also investigate the feasibility of an obfuscation approach that may be used to avoid algorithms specially designed to detect ChatGPT-generated text.

They surveyed participants (N=1601) who were chosen from five different nations, namely Brazil, India, Japan, the United Kingdom, and the United States, to understand better the perspectives of students and educators on both the usefulness of ChatGPT as well as the ethical and normative problems that are raised with its use. They also conducted more in-depth surveys of 151 undergraduate students and 60 professors at the authors’ university to examine variations in how different fields see ChatGPT. They discovered that ChatGPT performs as well as, if not better, students in nine of the 32 courses. They also find that the present detection algorithms frequently mistakenly identify ChatGPT replies as AI-generated rather than human-generated.

To make matters worse, an obfuscation attack renders these algorithms useless, missing 95% of ChatGPT responses. Finally, there appears to be agreement among students that they will utilize ChatGPT for their academic work and among instructors that doing so will be treated as plagiarism. Given the inherent tension between these two, educational institutions must develop acceptable academic integrity regulations for generative AI generally and ChatGPT particularly. In the era of generative AI, their findings provide contemporary insights that could direct policy talks regarding educational reform.

view

Alibaba Introduces Two Open-Source Large Vision Language Models (LVLM): Qwen-VL and Qwen-VL-Chat

By Niharika Singh -September 6, 2023

In the ever-evolving realm of artificial intelligence, the persistent challenge has been to bridge the gap between image comprehension and text interaction. A conundrum that has left many searching for innovative solutions. While the AI community has witnessed remarkable strides in recent years, a pressing need remains for versatile, open-source models that can understand images and respond to complex queries with finesse.

Existing solutions have indeed paved the way for advancements in AI, but they often fall short in seamlessly blending image understanding and text interaction. These limitations have fueled the quest for more sophisticated models that can take on the multifaceted demands of image-text processing.

Alibaba introduces two open-source large vision language models (LVLM) – Qwen-VL and Qwen-VL-Chat. These AI tools have emerged as promising answers to the challenge of comprehending images and addressing intricate queries.

Qwen-VL, the first of these models, is designed to be the sophisticated offspring of Alibaba’s 7-billion-parameter model, Tongyi Qianwen. It showcases an exceptional ability to process images and text prompts seamlessly, excelling in tasks such as crafting captivating image captions and responding to open-ended queries linked to diverse images.

Qwen-VL-Chat, on the other hand, takes the concept further by tackling more intricate interactions. Empowered by advanced alignment techniques, this AI model demonstrates a remarkable array of talents, from composing poetry and narratives based on input images to solving complex mathematical questions embedded within images. It redefines the possibilities of text-image interaction in both English and Chinese.

The capabilities of these models are underscored by impressive metrics. Qwen-VL, for instance, exhibited the ability to handle larger images (448×448 resolution) during training, surpassing similar models limited to smaller-sized images (224×224 resolution). It also displayed prowess in tasks involving pictures and language, describing photos without prior information, answering questions about pictures, and detecting objects in images.

Qwen-VL-Chat, on the other hand, outperformed other AI tools in understanding and discussing the relationship between words and images, as demonstrated in a benchmark test set by Alibaba Cloud. With over 300 photographs, 800 questions, and 27 different categories, it showcased its excellence in conversations about pictures in both Chinese and English.

Perhaps the most exciting aspect of this development is Alibaba’s commitment to open-source technologies. The company intends to provide these two AI models as open-source solutions to the global community, making them freely accessible worldwide. This move empowers developers and researchers to harness these cutting-edge capabilities for AI applications without the need for extensive system training, ultimately reducing expenses and democratizing access to advanced AI tools.

In conclusion, Alibaba’s introduction of Qwen-VL and Qwen-VL-Chat represents a significant step forward in the field of AI, addressing the longstanding challenge of seamlessly integrating image comprehension and text interaction. These open-source models, with their impressive capabilities, have the potential to reshape the landscape of AI applications, fostering innovation and accessibility across the globe. As the AI community eagerly awaits the release of these models, the future of AI-driven image-text processing looks promising and full of possibilities.

view

Meet RoboPianist: A New Benchmarking Suite for High-Dimensional Control in Piano Mastery with Simulated Robot Hands

By Tanushree Shenwai -July 19, 2023

The gauging process in the domains of control and reinforcement learning advance is quite challenging. A particularly underserved area has been robust benchmarks that focus on high-dimensional control, including, in particular, the perhaps ultimate “challenge problem” of high-dimensional robotics: mastering bi-manual (two-handed) multi-fingered control. At the same time, some benchmarking efforts in control and reinforcement learning have begun to aggregate and explore different aspects of depth. Despite decades of research into imitating the human hand’s dexterity, high-dimensional control in robots continues to be a major difficulty.

A group of researchers from UC Berkeley, Google, DeepMind, Stanford University, and Simon Fraser University presents a new benchmark suite for high-dimensional control called ROBOPIANIST. In their work, bi-manual simulated anthropomorphic robot hands are tasked with playing various songs conditioned on sheet music in a Musical Instrument Digital Interface (MIDI) transcription. The robot hands have 44 actuators altogether and 22 actuators per hand, similar to how human hands are slightly underactuated.

Playing a song well requires being able to sequence actions in ways that exhibit many of the qualities of high-dimensional control policies. This includes:

1.Spatial and temporal precision.

2.Coordination of 2 hands and ten fingers

3.Strategic planning of key pushes to make other key presses easier

150 songs comprise the original ROBOPIANIST-repertoire-150 benchmark, each serving as a standalone virtual work. The researchers study the performance envelope of model-free and model-based methods through comprehensive experiments like model-free (RL) and model-based (MPC) methods. The results suggest that despite having much space for improvement, the proposed policies can produce strong performances.

The ability of a policy to learn a song can be used to sort songs (i.e., tasks) by difficulty. The researchers believe that the ability to group tasks according to such criteria can encourage further study in a range of areas related to robot learning, such as curriculum and transfer learning. RoboPianist offers fascinating chances for various study approaches, such as imitation learning, multi-task learning, zero-shot generalization, and multimodal (sound, vision, and touch) learning. Overall, ROBOPIANIST shows a simple goal, an environment that is simple to replicate, clear evaluation criteria, and is open to various extension potentials in the future.

view

Can AI Truly Restore Facial Details from Low-Quality Images? Meet DAEFR: A Dual-Branch Framework for Enhanced Quality

By Mahmoud Ghorbel -September 6, 2023

In the field of image processing, recovering high-definition information from poor facial photographs is still a difficult task. Due to the numerous degradations these images go through, which frequently cause the loss of essential information, such activities are intrinsically hard. This problem highlights the difference in quality between low-quality and high-quality photographs. The question that follows is whether it is possible to use the inherent qualities of the low-quality domain to understand better and improve the process of facial repair.

Recent approaches have incorporated codebook priors, autoencoders, and high-quality feature sets to address this issue. These methods continue to have a significant weakness, nevertheless. They generally rely on a single encoder trained exclusively on high-quality data, omitting the special complexities that low-quality images have. Although innovative, such a method may unintentionally widen the domain gap and miss the subtleties of low-quality data.

A new paper was recently introduced to tackle these issues, presenting a fresh solution. This approach uses an extra “low-quality” branch to pull important details from blurry or unclear images, combining them with clearer picture details to improve face image restoration.

Here’s what stands out about their work:

1. They added a special tool to capture the unique features of low-quality images, bridging the gap between clear and unclear images.

2. Their method mixes details from both low and high-quality images. This mix helps overcome common problems in image restoration, leading to clearer, better results.

3. They introduced a technique called DAEFR to handle blurry or unclear face images.

Concretely, their approach involves several key steps:

1.Discrete Codebook Learning Stage: They establish codebooks for HQ and LQ images. Using vector quantization, they train an autoencoder for self-reconstruction to capture domain-specific information. This stage produces encoders and codebooks for both HQ and LQ domains.

2.Association Stage: Drawing inspiration from the CLIP model, they associate features from the HQ and LQ domains. Features from domain-specific encoders are flattened into patches to construct a similarity matrix. This matrix measures the closeness of these patches in terms of spatial location and feature level. The goal is to minimize the domain gap and produce two associated encoders integrating information from both domains.

3.Feature Fusion & Code Prediction Stage: The LQ image is encoded using both encoders after obtaining associated encoders. A multi-head cross-attention module merges features from these encoders, producing a fused feature encompassing information from both HQ and LQ domains. Subsequently, a transformer predicts the relevant code elements for the HQ codebook, which are then used by a decoder to generate the restored HQ images.

The authors evaluated their method through a series of experiments. They trained their model using the PyTorch framework on the FFHQ dataset of 70,000 high-quality face images. These images were resized and synthetically degraded for training purposes. For testing, they chose four datasets: CelebA-Test and three real-world datasets. Their evaluation metrics ranged from PSNR and SSIM for datasets with ground truth to FID and NIQE for real-world datasets without ground truth. Compared with state-of-the-art methods, their DAEFR model displayed superior perceptual quality on real-world datasets and competitive performance on synthetic datasets. Additionally, an ablation study revealed that using two encoders was optimal, and their proposed multi-head cross-attention module improved feature fusion, underscoring the method’s efficacy in restoring degraded images.

To conclude, we presented in this article a new paper that was published to address the challenges of image restoration, particularly for low-quality facial photographs. The researchers introduced a novel method, DAEFR, which harnesses both high and low-quality image features to produce clearer and more refined restorations. This approach uniquely uses a dual-encoder system, one each for high and low-quality images, bridging the existing gap between the two domains. The solution was evaluated rigorously, showing notable improvements over previous methods. The paper’s findings underscore the potential of DAEFR to significantly advance the field of image processing, paving the way for more accurate facial image restorations.

view

Meet XTREME-UP: A Benchmark for Evaluating Multilingual Models with Scarce Data Evaluation, Focusing on Under-Represented Languages

By Tanya Malhotra -May 24, 2023

The fields of Artificial Intelligence and Machine Learning are solely dependent upon data. Everyone is deluged with data from different sources like social media, healthcare, finance, etc., and this data is of great use to applications involving Natural Language Processing. But even with so much data, readily usable data is scarce for training an NLP model for a particular task. Finding high-quality data with usefulness and good-quality filters is a difficult task. Specifically talking about developing NLP models for different languages, the lack of data for most languages comes as a limitation that hinders progress in NLP for under-represented languages (ULs).

The emerging tasks like news summarization, sentiment analysis, question answering, or the development of a virtual assistant all heavily rely on data availability in high-resource languages. These tasks are dependent upon technologies like language identification, automatic speech recognition (ASR), or optical character recognition (OCR), which are mostly unavailable for under-represented languages, to overcome which it is important to build datasets and evaluate models on tasks that would be beneficial for UL speakers.

Recently, a team of researchers from GoogleAI has proposed a benchmark called XTREME-UP (Under-Represented and User-Centric with Paucal Data) that evaluates multilingual models on user-centric tasks in a few-shot learning setting. It primarily focuses on activities that technology users often perform in their day-to-day lives, such as information access and input/output activities that enable other technologies. The three main features that distinguish XTREME-UP are – its use of scarce data, its user-centric design, and its focus on under-represented languages.

With XTREME-UP, the researchers have introduced a standardized multilingual in-language fine-tuning setting in place of the conventional cross-lingual zero-shot option. This method considers the amount of data that can be generated or annotated in an 8-hour period for a particular language, thus aiming to give the ULs a more useful evaluation setup.

XTREME-UP assesses the performance of language models across 88 under-represented languages in 9 significant user-centric technologies, some of which include Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Machine Translation (MT), and information access tasks that have general utility. The researchers have developed new datasets specifically for operations like OCR, autocomplete, semantic parsing, and transliteration in order to evaluate the capabilities of the language models. They have also improved and polished the currently existing datasets for other tasks in the same benchmark.

XTREME-UP has one of its key abilities to assess various modeling situations, including both text-only and multi-modal scenarios with visual, audio, and text inputs. It also offers methods for supervised parameter adjustment and in-context learning, allowing for a thorough assessment of various modeling approaches. The tasks in XTREME-UP involve enabling access to language technology, enabling information access as part of a larger system such as question answering, information extraction, and virtual assistants, followed by making information accessible in the speaker’s language.

Consequently, XTREME-UP is a great benchmark that addresses the data scarcity challenge in highly multilingual NLP systems. It is a standardized evaluation framework for under-represented language and seems really useful for future NLP research and developments.

view

This Artificial Intelligence (AI) Paper Presents A Study On The Model Update Regression Issue In NLP Structured Prediction Tasks

By Tanushree Shenwai -December 6, 2022

Model update regression is the term used to describe the decline in performance in some test cases following a model update, even when the new model performs better than the old model. Adopting the new model can be slowed because its occasional worse behavior can overshadow the benefits of overall performance gains.

Consider a scenario where a user’s preferred method of asking about traffic has been ignored by their recently upgraded virtual assistant. Even if the assistant has been enhanced in other ways, this might drastically diminish the user experience.

Classification issues in computer vision and natural language processing have previously been studied in model update regression (NLP) context. On the other hand, there is the scant treatment of practical NLP uses in the formalization of classification.

So far, only a few studies have focused on solving the model update regression problem using structured prediction tasks. In structured prediction (e.g., a graph or a tree), the global forecast is typically made up of several local predictions instead of a single global prediction, as with classification tasks (e.g., nodes and edges). It’s possible for certain regional forecasts to be accurate even if the global one is off. Therefore, regression from model updates can occur at granular levels. It’s important to note that the output space is input-dependent and can be massive.

New research by Amazon investigates the regression of model updates on one general-purpose job (syntactic dependency parsing) and one application-specific task (sentence segmentation) (i.e., conversational semantic parsing). To accomplish this, they have established evaluation methods and tested various model updates originating from many places, such as modifications to the model’s design, training data, and optimization procedures. Their findings show that regression due to model updates is common and noticeable in many model update contexts.

The preceding result demonstrates the universal and pressing need for methods to lessen the occurrence of regression during model updates in structured prediction. Earlier work takes a cue from knowledge distillation, in which a new model (the student) is trained to suit the output distribution of the original model (teacher). Due to the difficulty in determining the precise distribution of global prediction, vanilla knowledge distillation cannot be easily applied to structured prediction models. And there are other methods to break down structured prediction’s end-goal forecasting.

According to researchers, a heterogeneous model update is whenever two or more models (each with a unique factorization). It is not possible to use factorization-specific approximation or even knowledge distillation at the local predictions’ level to update a heterogeneous model. This work uses sequence-level knowledge distillation to generate general solutions without assuming any particular factorizations. The model ensemble has provided a strong baseline for minimizing model update regression in prior work, albeit not as practical due to the high computing cost.

This work also introduces a generalized Backward-Congruent Re-ranking technique (BCR). BCR explores the wide range of results obtained by structured prediction. This means that a novel model can generate various predictions with comparable accuracy. The one with the most backward compatibility can be chosen.

In short, BCR takes a set of candidate structures predicted by a new model and utilizes the old model as a re-ranker to choose the best one. For better candidate diversity and quality, the team suggests dropout-p sampling, a straightforward and generalized sampling strategy. Contrary to expectations, BCR is both a flexible and surprisingly effective solution for preventing model update regression, significantly outperforming knowledge distillation and ensemble methods in all model update situations evaluated. Even more surprisingly, the results show that BCR can boost the new model’s precision.

The researchers believe these results are generalizable and can be applied in many different contexts, not just at Amazon. They suggest that other structured prediction tasks, such as text creation in NLP, image segmentation in computer vision, and scenarios of several rounds of model updates, can be investigated, and model update regression should be studied in these areas in the future.