Introduction: AI Terminology

This glossary does not cover all existing terms, as it is intended only as an introductory overview. For more comprehensive information, further research is recommended.

Prompting and Query Engineering

Prompt engineering

is the process of structuring or composing a query (prompt) for a language model to obtain the most accurate and useful response. For example, instead of a simple "Tell me about photos," one could say "Describe this image of a beach and sunset"—such a query clearly sets the context.

Zero-shot prompting

is an approach where only the task itself is given to the model without any examples of the desired output. The model relies on general knowledge from pre-training to solve the task immediately. For example, asking to classify an IT task as urgent: “Problem priority: High, Medium, or Low,” and the model will understand the context on its own.

One-shot prompting

involves adding exactly one example to the model. For example: “Example: Translate the phrase ‘Bonjour’ from French to English – the answer is ‘Hello’. Now translate: ‘Au revoir’.” The model sees one example and applies the rule to the new phrase.

Few-shot prompting

is when the query includes several examples (usually 2–10) of tasks with correct answers. The model learns from these examples and then solves a new task. For example, after showing a few phrases labeled with "Positive/Negative" sentiment, the model is asked to classify a new phrase.

Chain-of-Thought (CoT)

is a technique where the model sequentially derives intermediate reasoning steps before giving the final answer. For example, when asked, “The cafeteria had 23 apples, 20 were eaten, and then 6 more were bought. How many are there now?”, the model might first compute “23 – 20 = 3; 3 + 6 = 9,” and then respond with “9.”

ReAct (Reasoning + Acting)

is an approach that combines reasoning and actions. The model formulates "thoughts" (short Chain-of-Thought) and "actions" (e.g., requesting information from an external source), then reasons again. For example, for information retrieval, the model may respond in the format: “Question – Thought – Action – Observation.”

Self-Consistency

is a method to improve the reliability of inference: the model generates multiple responses to the same question (usually with CoT), and then the majority opinion among these responses is taken. For example, if out of 5 classifications of an email, "IMPORTANT" appears twice and "NOT IMPORTANT" three times, the final result is "NOT IMPORTANT." This reduces the impact of random variations.

Multimodal prompting

involves including different types of data in the prompt (e.g., images + text). The model learns to consider visual context. For example, showing a photo of a party and giving the text prompt “Describe this event” allows the model to integrate visual features and text for a detailed description.

Prompt Chaining

breaks down a complex task into a sequence of related prompts. For example, to process a text in Spanish, you might use a chain like: “1) Read the text. 2) Translate it into English. 3) Extract the main facts. 4) Make a bullet list. 5) Translate the list back into Spanish.” Each step clarifies the task and passes its result to the next.

Meta-Prompting

is an advanced technique focusing not on specific content, but on the structure and syntax of the task. In this case, abstract patterns or rules (e.g., “think step by step”) are given in the prompt, independent of the content. This helps standardize the model’s behavior: for example, offering it an abstract reasoning algorithm rather than specific examples.

Dynamic Prompting

adapts prompts on the fly depending on the complexity of the task and the model’s intermediate results. The idea is that the model can change the length of reasoning chains or refine the query if the first answer is not satisfactory. For example, if "think step by step" isn't enough, the model can add more steps. This self-adjustment increases robustness, especially for smaller models.

Retrieval-Augmented Generation (RAG)

is a hybrid of LLM and search. The idea is that before generating a response, the model searches for additional information in external sources and uses it when forming the answer. For example, a chatbot first finds relevant documents or articles (via web search or knowledge base), then generates a response based on the found text. This improves accuracy and reduces "hallucinations" because the model relies on actual data.

Self-Prompting

is a method where the model generates auxiliary prompts or examples on its own. For example, an LLM might first generate a fictional "question–answer" pair or a short text and then set itself a follow-up task based on that. The idea is that the model creates its own demonstration examples for self-learning. For instance, it can generate a short informative paragraph and a few related questions, and then use these pairs to answer similar queries later.

Tree of Thought (ToT)

Prompting is an extension of CoT where the model builds multiple parallel lines of reasoning, or “branches” of thought, and selects the most successful ones. It’s like drawing a tree of possible solutions instead of a single chain. The model can backtrack and explore alternatives if the first path doesn’t lead to a clear solution.

Guided Prompting

embeds specific instructions or constraints in the prompt to guide the model toward the desired outcome. For example, instead of asking “How to create a budget?” you might say “Step-by-step, explain how to create a personal budget for next year,” which encourages the model to provide a detailed plan. In essence, a clearer guide (“Guided = with direction”) is given in the prompt.

Mix of Experts (MoE)

is an architecture with a "mixture of experts": the model is divided into several specialized subnetworks (experts), and only a subset is activated for each input. Each expert is trained on a specific part of the data. For example, in LLM-MoE, there are many FFN blocks (experts) in a layer, and a "router" selects the most relevant experts for the current token. This allows for scaling the model while maintaining efficiency.

Adaptive Prompting

is a method where prompts dynamically change based on the model’s reasoning process. The model can adjust the prompts, add clarifications, or rephrase the question during processing. The idea is similar to Self-Consistency and Dynamic Prompting—the model "listens" to itself and adjusts the prompt for better results.

System Prompting

refers to special instructions set not by the user but by the system/developer to define the model’s behavior boundaries. For example, in ChatGPT, a system message like “You are a friendly assistant teaching schoolchildren” makes the model respond in a teacher-like tone. The system prompt sets the tone, style, and limitations (e.g., “do not ask personal questions”) throughout the session.

Role Prompting

is a type of system prompting where a specific role or persona is assigned to the model. For example, “You are a history expert” or “you are a critic writing a movie review.” The model then responds in the character of the assigned role. This changes the style, depth, and accuracy of the response to suit the task. In the case of the "reviewer" role, the text becomes more detailed and journalistic.

Context Length Management

involves considering the token limit that a model can process per request. Every LLM has a maximum context window (e.g., GPT-4 — 8K tokens). With long inputs, techniques like compression or splitting the text are used to stay within the limit while preserving key information. For example, splitting a large document into parts or providing a summary before the main task. Context management ensures that important content fits into the model's "memory."

Few-shot Calibration

is a technique for adjusting the output in few-shot scenarios. For example, by adding "zero" examples or balancing label distributions to reduce bias. Specific techniques include adding "calibrated" examples (e.g., “This is an example of a positive review”) to stabilize the model.

Max Tokens

is a parameter in generative model APIs that limits the length of the response. It defines the maximum number of tokens allowed in the model's output. If the sum of input and output tokens exceeds the limit, the generation will be cut off. For example, with max_tokens=500 and 3500 tokens in the input (from a total limit of 4000), only 500 tokens remain for the response. This parameter helps control the volume of generated text and computational costs.

Model Training and Adaptation

Pre-training

is the initial stage of training large models. The model is fed a vast amount of diverse text (from the internet, books, etc.) and learns to predict the next token in a sentence. During this phase, it "reads" billions of words, developing a general understanding of language (grammar, facts, logic). This is similar to a model "reading tons of books" to learn a language. For example, during training, the model constantly practices filling in the blanks in texts and becomes capable of generating coherent responses.

Fine-tuning

is the process of further training a pre-trained model on a narrowly specialized dataset. Here, the model is adjusted to a specific task (industry-specific text, code, medical records, etc.). For example, GPT can be fine-tuned on legal texts so it can better answer legal questions. Unlike pre-training, fine-tuning requires significantly less data but results in more targeted responses within the model’s domain.

RLHF (Reinforcement Learning from Human Feedback)

is a method where the model is trained using human feedback. First, model responses are collected and rated by people (based on how "human-like," "polite," or "accurate" they are). Based on these ratings, a "reward model" is built, which translates response qualities into numerical rewards. The generative model is then fine-tuned to maximize this reward. The goal is to align the AI with human preferences. For example, in machine translation, RLHF helps select not just the "correct" but also the most "natural" translation.

LoRA (Low-Rank Adaptation)

is an efficient method for adapting large models without full retraining. Instead of modifying all the weights, small "low-rank" matrices are added to each layer and trained for the new task. The main model remains "frozen," and only these small modules are tuned. For example, using LoRA, GPT-3 (with 175 billion parameters) can be adapted by training only 18 million parameters—greatly reducing resource consumption. After training, the LoRA weight matrices can be merged into the model without quality loss.

Adapter Layers

follow a similar idea: small trainable modules are inserted between the layers of a pre-trained network. These adapters learn to capture the characteristics of the new task, while the main weight structure of the model remains unchanged. The advantage is that a single large base LLM can be used with different adapters for various tasks. For example, BERT with adapters can be used for text classification or translation generation. Adapters speed up fine-tuning and require less memory than full fine-tuning.

Masked Language Modeling (MLM)

is a pre-training method where the model is trained to fill in the blanks. In the text, some words are randomly masked (hidden), and the model learns to predict what was originally there. For example, from the sentence "I love ___ and ice cream," the model should guess "chocolate." MLM is often used in BERT-like models to help them understand the full context (both left and right of a word) during training.

Causal Language Modeling (CLM)

involves training the model to predict the next word in a left-to-right fashion. This is the approach used in GPT-like models: the model reads text from beginning to end and at each step predicts the next token. During this training, the model only sees the already generated part and does not "look ahead." This mode allows GPT to generate text sequentially—from the first word to the last.

Next Sentence Prediction (NSP)

is an auxiliary task used during BERT's pre-training. The model receives pairs of sentences and learns to determine whether the second sentence is a "true continuation" of the first or a random one. It checks whether the second sentence logically follows the first. This trains the model to capture higher-level connections between phrases.

Contrastive Learning

is a method for learning data representations using pairs of "similar" and "dissimilar" examples. The goal is to build a feature space where similar objects (images, texts, audio, etc.) are close to each other, while dissimilar ones are far apart. For example, CLIP was trained contrastively: the model pulls together the representations of an image and its matching caption in the embedding space, while pushing unrelated pairs apart. As a result, it "understands" the semantics of both images and text.

Autoencoders

are neural networks trained to reconstruct their input data. In the simplest case, they compress the input ("encode"), pass it through a narrow "bottleneck," and then expand it back ("decode") to try to reproduce the original signal. The goal is to find "hidden" variables and the most important features of the input data. For example, an autoencoder for images might learn to remove noise or compress an image into a compact vector. In the case of Variational Autoencoders (VAE), this approach is extended to generate new samples.

Architectures and Fine-Tuning Methods

PEFT (Parameter-Efficient Fine-Tuning)

is a general term for approaches (including LoRA, adapters, prefix-tuning, and others) that fine-tune only a small portion of a model’s parameters, keeping the rest "frozen." This saves computational resources.

QLoRA (Quantized LoRA)

is a combined approach: the model is stored in low precision (quantized), while LoRA adaptation is performed while preserving precision in critical parts. This allows for even lower memory requirements during fine-tuning.

P-Tuning v2

extends the idea of prompt tuning. Vector-based ("soft") prompts are added not only at the beginning of the text but also across different layers of the model. These embedded "vector hints" are trained together with the adaptation, allowing the model to better adjust to the task.

LoRA-Fusion

is a method for combining (merging) multiple LoRA modules trained for different tasks into a single model. This allows switching between tasks without recompiling all weights—just by loading the required LoRA adapters.

Delta Tuning

is an approach where, instead of fully retraining the model for a new task, only the differences (deltas) between the original and new weights are trained. This is similar to LoRA or adapters: the main weights are fixed, and only the adjustments are stored.

Hypernetwork Adaptation

uses an auxiliary network (hypernetwork) that generates or adjusts the weights of the main LLM. Instead of direct fine-tuning, the hypernetwork creates weight updates tailored to a specific task. This provides flexible adaptation without modifying the main layers.

Prefix Tuning

is a method where special "prefix tokens" (in the form of trainable vectors) are added to each Transformer layer. The model receives not only the user’s input but also this "implicit" context, which is trainable. This achieves an effect similar to fine-tuning: prefixes define the task at a deeper level with minimal structural changes.

AdapterFusion

is a technique that combines multiple adapters (Adapter Layers) for different tasks. It trains a network that decides how to "mix" the outputs of different adapters before the decoder. This allows using a single LLM with multiple adapters and dynamically combining their effects.

Mixture-of-Experts Routing

is a mechanism where a "router" selects which experts (subnetworks) from a MoE system to activate for each input. For example, it may choose 2–4 experts from a larger pool based on the input signal. This achieves a balance between model power and efficiency.

Prompt-based Distillation

is a method of transferring knowledge from a teacher model to a student model via prompts: the teacher model generates answers to prompts, and the student model is trained to mimic these answers. This allows transferring knowledge from large models into lighter versions.

Model Parallelism

is a technique where parts of a single model are distributed across multiple devices (GPU, TPU) for training or inference. For example, one layer runs on one GPU, another on another. This enables working with very large models that do not fit into the memory of a single device.

Pipeline Parallelism

is a type of distribution where different layers of a model process data sequentially along a pipeline across different devices. While one data chunk is processed through the first layer on device A, another chunk is processed through the second layer on device B, and so on. This increases training throughput.

Sparse Attention

is an optimization of the attention mechanism: instead of full attention over all tokens (O(n²)), attention is limited to local or structured regions. For example, models like Longformer or BigBird use sliding windows or random connections to process long texts more efficiently.

FlashAttention

is a highly efficient algorithm for computing exact attention, optimized for GPUs. It uses optimized tiling operations to reduce memory access and speed up computation. Thanks to FlashAttention, processing large contexts is faster without losing accuracy.

Multi-Agent Systems and Management

MCP (Multi-Agent Control Protocol)

a protocol for distributed control of agent groups. It describes how multiple AI agents exchange information and coordinate actions. (Note: "MCP" is sometimes also used to refer to Model-Conditioned Prompting — a prompting technique where the structure of the prompt depends on the model.)

Multi-Agent Reinforcement Learning (MARL)

a method where multiple agents are trained in a shared environment, taking into account their interactions. Each agent learns to make decisions while considering the possible actions of others. It is used, for example, in strategy games or cooperative settings.

Self-Organizing Agents

agents capable of forming interaction structures autonomously without external control. For example, swarm robots that assign roles among themselves based on environmental signals.

Emergent Behavior

the appearance of complex collective effects in a system from simple interaction rules among agents. For example, in bird flock simulations, each bird follows simple rules: stay together, avoid collisions, evade predators — yet the result is a cohesive, intelligent-looking flock.

Swarm Intelligence

an approach inspired by the collective behavior of biological groups (ants, bees). A large number of simple agents (robots, programs) coordinate in a decentralized way to solve tasks (search, patrol), usually through local interaction and signal exchange.

Cooperative AI

a research direction focused on training agents to effectively collaborate with each other and with humans. It includes developing learning methods where agents maximize overall utility rather than just their own. The goal is to build safe multi-agent systems where participants avoid conflicts and deception.

Decentralized Agent Coordination

algorithms for organizing agent work without a central leader. Each agent acts autonomously, with communication and coordination based on local rules or protocols (e.g., local message exchange, oracles).

Agent-Based Modeling

a method for simulating complex systems using sets of autonomous agents. It allows researchers to study how simple-level interactions lead to global effects (economy, ecology, social systems, etc.).

Sim2Real Transfer

a technology for transferring agent behavior learned or tested in simulation into real-world environments. Adaptation methods are used to account for differences in physics, sensor noise, etc. For example, a robot may learn to walk in a simulator and then apply the learned strategies in real life (with additional tuning).

Hierarchical Agent Architectures

multi-level systems where "higher" agents set goals or manage "lower" ones. This allows breaking down complex tasks: for example, the upper layer plans goals, and the lower layer executes movements. Hierarchies improve scalability and explainability of system behavior.

Task Allocation Strategies

methods for distributing tasks among agents. It can be static (manually assigned) or dynamic (agents negotiate who takes what based on load or capability). Example: in a drone swarm, each drone signals how much energy is left and routes are redistributed for territory coverage.

Inter-Agent Communication Protocols

rules and formats for message exchange between agents. These can include formalized question-answer languages, signaling, or shared "chat" between robots. Good protocols allow agents to quickly exchange necessary information (request help, report a target) and coordinate.

Critic-Actor Agents

in reinforcement learning (RL), this is a setup where the "actor" generates actions, and the "critic" evaluates them and provides learning signals. In a multi-agent setting, there may be separate actors and critics for each agent or a shared critic. This simplifies RL: actors learn quickly based on critic evaluations.

Adversarial Agents

agents trained to compete or deceive others. For example, in games where one agent tries to win, it may target the opponent's weaknesses. In training, this is often used to improve robustness: one agent learns a good strategy, and another learns counter-strategies.

Distributed Prompting

a concept where multiple agents jointly generate a prompt. For example, one agent generates an idea, another refines it, and a third checks it. This is similar to "swarm intelligence" in text or solution generation: each agent contributes part of the information.

Autonomous Prompt Optimization

a method where agents automatically improve their prompts to LLMs. For example, an agent may rephrase or expand a prompt based on the model’s response to make it more effective. This iterative optimization helps use LLMs efficiently without human intervention.

Visual and Multimodal Models

Diffusion Models

are a class of generative models (e.g., Stable Diffusion) that first add noise to an image until it becomes random noise, and then learn to reverse this process to reconstruct the original image. This is an alternative to GANs: instead of adversarial training, the model learns to denoise step-by-step. Diffusion Models are currently popular for generating high-quality images.

Score-based Generative Models

are closely related to diffusion models. They use gradients of a "score function" from the data distribution to gradually generate samples. These models learn to estimate how to slightly reduce noise at each step in order to produce realistic outputs.

GANs (Generative Adversarial Networks)

are a class of networks where a generator learns to create data (e.g., images), and a discriminator learns to distinguish real examples from synthetic ones. They "compete" with each other: the generator improves by fooling the discriminator, and the discriminator improves its ability to detect fakes. GANs can generate highly realistic images and are often used in art and media.

VAEs (Variational Autoencoders)

are a special type of autoencoder that also models the distribution of features. They encode data into a parameterized distribution (usually normal) and sample from it during generation. This allows smooth interpolation between samples and the generation of diverse variations. VAEs are widely used for content generation with control over latent variables.

NeRF (Neural Radiance Fields)

is an approach to 3D scene reconstruction from multiple images. NeRF trains a neural network that, given coordinates (x, y, z and viewing angle), predicts the color and density of a point in 3D space. This enables the generation of new viewpoints of a scene (e.g., animating a head turn) based on photographs.

ImageBind

is an architecture (from Meta) that connects different modalities (images, text, audio, sensors) into a shared representation. The model learns to associate information from different sources, making it easier to handle multimodal tasks—for example, understanding speech from lip movements.

Segment Anything Model (SAM)

is a universal image segmentation model (from Meta) capable of identifying and isolating objects without additional training. It was trained on a massive collection of annotated images so it can "segment anything"—from cars on the road to defects on a sheet of paper. In SAM, you only need to point to an object with a dot or a phrase, and the model returns a mask.

Visual Question Answering (VQA)

refers to tasks where a model answers questions based on an image. For example, showing an image and asking "How many people are in the photo?" or "What color is the car?" requires a multimodal model to understand both the image and the text query. VQA tests the ability of models to jointly process visual information and natural language.

Vision-Language Pretraining (VLP)

is a strategy for joint pretraining on image-text pairs. Typically, the model is trained to generate captions from images or reconstruct text from images. The goal is to build a shared representation that links visual and linguistic concepts. CLIP and DALL·E are examples of VLP models.

Multimodal Fusion

refers to combining various data sources (video, sound, text, images) within a single model. For example, when generating a caption for a video, both visual and audio tracks are considered. This can involve simple feature concatenation or more complex mechanisms like cross-modal attention.

Cross-modal Attention

is an extension of self-attention where the model computes attention between elements of different modalities. For example, in a multimodal transformer, each image "token" can attend to text tokens and vice versa. This allows one modality to "see" the context of another.

Visual Grounding

is the task of linking text to specific elements in an image. For example, if a description says "the red person on the left," the model should point to that exact person in the image. This is important for understanding which part of the image corresponds to the text.

Captioning Models

are models that generate textual descriptions (captions) for images or videos. For example, DALL·E and Imagen can not only generate images but also describe what is depicted, explaining objects, actions, and settings. These models are trained on image-caption pairs.

Multimodal Retrieval

refers to systems that, given a query in one modality (e.g., text), search for relevant content in another (e.g., images). For example, the query "red car" should return images of cars. Or vice versa: given an image, find related text articles. This works by training a model to create a shared representation of images and text.

Embodied AI

refers to AI systems connected to real-world robots or agents in physical environments. "Embodied" AI includes perception through sensors and actions in the environment. For example, a robot tutor may be trained not only to answer questions but also to move and manipulate objects. This approach combines computer vision, planning, multimodal models, and control.

Audio, Video, and Speech

TTS (Text-to-Speech)

conversion of text into speech. TTS models generate audio signals, intonating and articulating the text. Modern systems (e.g., Tacotron, WaveNet, VITS) can voice text with almost no robotic sound, using deep neural networks and large audio corpora.

ASR (Automatic Speech Recognition)

conversion of audio speech into text. ASR models listen to speech and transcribe it. This includes noise reduction, speech/whisper separation, and even understanding accents. Examples: Whisper, DeepSpeech. Models are trained on large speech-text datasets for multiple languages.

Voice Cloning

adaptation of TTS to a specific voice. A short recording of a speaker is enough for the model to generate speech imitating that voice. It uses few-shot or fine-tuning techniques: the model "learns" the speaker’s unique timbre and intonation while preserving articulation.

Speech-to-Text (STT)

synonymous with ASR. Emphasizes that the goal is to convert spoken words into written ones.

Speech Enhancement

technology for improving audio quality: noise suppression, echo cancellation, distortion restoration. Neural networks are often used to remove background noise or restore voice clarity (e.g., Deep Noise Suppression).

Audio Captioning

generating textual descriptions of audio signals. The model listens to a sound (e.g., city street recordings) and generates a description such as: “Sirens, car horns, and crowd noise.” Similar to visual captioning, but for sound.

Audio-Visual Fusion

joint processing of sound and image. For example, synchronizing voice and video (lip-syncing), or emotion recognition from both speech and facial expressions. Multimodal models can understand event context: seeing a person gesture and hearing what they say.

Music Generation

creating music using neural networks. Examples: MusicLM, Jukebox by OpenAI. These models generate melodies from text or audio prompts, can continue a melody, or create harmonies using large music datasets.

Lip Sync Models

models that synchronize lip movements. They take audio (speech) and generate lip movements of a character or a video of a speaking person. Often used in animation and deepfakes: for example, dubbing translates a character’s speech into another language while preserving facial expressions.

Emotion Recognition in Speech

identifying emotions from voice. The model analyzes tone, intonation, and timbre to determine the speaker’s mood (joy, anger, sadness). Used in customer service and psychology to analyze how something is said, not just what is said.

Reinforcement Learning and Behavior Generation

Imitation Learning

training an agent by observing an expert. Instead of using a reward function (as in RL), the model "copies" the actions of a demonstrator (a human or another agent). For example, teaching a drone to fly by watching videos of experienced pilots.

Inverse Reinforcement Learning

an approach where the model tries to infer the expert’s reward function based on their behavior. In other words, by observing how an expert acted, the system reconstructs what they were trying to optimize. This is useful when goals are hard to formalize directly, but examples of "ideal" agent behavior are available.

Curriculum Learning

a method where tasks gradually increase in difficulty, similar to how humans learn: starting with simple exercises, then progressing to harder ones. For example, a robot first learns to walk on flat ground, then on uneven terrain. This helps achieve complex skills more easily.

Self-Play

agents learn by playing against themselves. In the famous AlphaZero, for example, the agent learned to play chess by playing thousands of games against itself and improving its strategy. This method allows the development of sophisticated strategies even without external examples.

World Models

the idea that an agent builds an internal "model of the world" (often a neural network) that allows it to simulate the environment and plan in its mind. The model is trained to predict the consequences of actions (e.g., forward/backward movement) and then uses this simulation for learning or planning, speeding up real-world experience.

Model-based RL

a method where the agent builds a model of the environment (predicting transitions and rewards) and plans within it. This allows more efficient learning through predictions.

Model-free RL

the classic RL approach where the agent directly learns to maximize reward without building an environmental model (e.g., Q-learning, Policy Gradients). Faster to set up but requires more experience than model-based RL.

AlphaZero-style Training

an iterative self-play method with search (MCTS) and neural network learning. Starting from scratch, the agent learns complex games (chess, go) without any pre-existing data: it plays against itself, improving its strategy through simulations and retraining the neural network on its own games (following an RL + self-play framework).

Proximal Policy Optimization (PPO)

a popular reinforcement learning algorithm. It optimizes the agent’s policy smoothly, preventing large deviations from the current policy. This increases learning stability in stochastic environments.

Trust Region Policy Optimization (TRPO)

a predecessor of PPO: an RL algorithm that guarantees the new policy does not deviate too far from the old one (limits the "trust region" of changes). It allows safer policy updates but is more computationally complex.

Soft Actor-Critic (SAC)

an RL algorithm that combines learning with entropy maximization (randomness) in actions. The agent learns to choose not only reward-optimal but also "interesting" actions, which speeds up learning and makes it more robust to overfitting.

Safety and Ethics

Constitutional AI

an approach developed by Anthropic where a model is trained to follow a set of "constitutional rules" (ethical instructions) defined by humans. The model generates responses that are evaluated by another system simulating human judgment. This allows automatic filtering of undesirable outputs without constant human intervention.

Alignment Tuning

fine-tuning models to align their behavior with human values and goals. It includes techniques like RLHF and Constitutional AI, aiming to ensure that models respond not only correctly but also helpfully, ethically, and accurately from a human perspective.

Safety Layers

additional modules or filters that process a model’s output. For example, after text generation, it may be passed through a "toxicity module" that blocks unacceptable content. This form of "shielding" works alongside the main model to improve safety.

Bias Mitigation

methods for detecting and reducing AI bias. Examples include dataset balancing, additional penalties for inappropriate responses, or adapting models to avoid reproducing stereotypes (based on race, gender, religion, etc.).

Explainable AI (XAI)

approaches that help understand how a model reached its decision. This includes local explanations (e.g., why a specific answer was given), attention visualization, and decision rules. It enhances trust and model debugging.

Transparency Auditing

independent evaluation of AI systems to assess their characteristics: where the data comes from, what decisions the model makes, and why. This is used by companies and regulators to ensure compliance with safety and fairness standards.

Differential Privacy

a technique for private machine learning: training data is encrypted or "noised" so that no individual record can be reconstructed from the model's output. This allows models to be trained on sensitive data (medical or financial) while preserving privacy.

Federated Learning

a method of training a distributed model on user devices without transferring local data. A device trains the model locally, sends only updates (gradients) to a central server, where they are aggregated into a global model. This reduces the risk of private information leakage.

Metrics and Evaluation

BERTScore

a metric for evaluating text generation based on BERT embeddings: it measures the semantic similarity between the generated text and a reference. The closer the vector representations, the higher the score. It better captures meaning than simple word overlap (e.g., BLEU).

FactScore

a metric for assessing the factual accuracy of generated text. It measures how well the facts in the model's response align with true knowledge. It often includes a fact extraction subsystem and comparison with knowledge bases.

TruthfulQA

a test set of questions where models often give incorrect answers, showing "hallucinations" or bias. The metric evaluates the percentage of truthful responses. It helps assess how reliably an AI assistant provides factual information.

Winogrande

a dataset for evaluating commonsense reasoning and language understanding. It contains sentences with ambiguous pronouns and requires contextual understanding. The metric is the percentage of correctly resolved cases (e.g., who "he" refers to).

HellaSwag

a dataset of practical "common sense" situations. The model must choose the most logical continuation of a scenario. The metric is the accuracy of choosing the correct option from several. It is used to test general reasoning and language intuition in AI.

GLEU, CHRF

metrics for evaluating translation and generation quality. GLEU is a modified version of BLEU that better accounts for word order, while CHRF is based on character n-gram matching. Both provide numerical similarity scores between machine-generated text and reference texts.

Consistency Metrics

measure how consistent a model is across different responses or minor variations in input. For example, if a model is asked a slightly rephrased question and gives a different answer, the consistency score will be low. This is important for testing the stability and reliability of LLMs.

Last updated

Was this helpful?