A #

Activation Function

A mathematical function applied to the output of a neural network node that introduces non-linearity, allowing the network to learn complex patterns. Common activation functions include ReLU (Rectified Linear Unit), sigmoid, tanh, and softmax.

Adversarial Attack

A technique where carefully crafted inputs are designed to fool machine learning models into making incorrect predictions. These attacks exploit vulnerabilities in AI systems and are important for understanding model robustness.

Agent

In AI, an autonomous entity that perceives its environment through sensors and acts upon it through actuators to achieve specific goals. AI agents can range from simple rule-based systems to complex autonomous systems capable of learning and adaptation.

AGI (Artificial General Intelligence)

A hypothetical type of AI that would possess human-like cognitive abilities across all domains, capable of understanding, learning, and applying knowledge to any intellectual task that a human can perform.

Algorithm

A step-by-step procedure or formula for solving a problem or accomplishing a task. In machine learning, algorithms are the mathematical procedures that enable models to learn from data.

Alignment

The challenge of ensuring that AI systems behave in accordance with human values and intentions. AI alignment research focuses on making AI systems safe, beneficial, and aligned with human goals.

Attention Mechanism

A technique that allows neural networks to focus on relevant parts of the input when producing output. Self-attention, used in transformers, enables models to weigh the importance of different elements in a sequence.

B #

Backpropagation

The fundamental algorithm for training neural networks. It calculates the gradient of the loss function with respect to each weight by propagating errors backward through the network, enabling weight updates via gradient descent.

Batch Size

The number of training examples used in one iteration of model training. Larger batch sizes provide more stable gradient estimates but require more memory, while smaller batches can offer regularization benefits.

BERT (Bidirectional Encoder Representations from Transformers)

A transformer-based language model developed by Google that revolutionized NLP by pre-training on large text corpora and understanding context from both directions. BERT excels at tasks like question answering and sentiment analysis.

Bias (in ML)

Can refer to: (1) a parameter in neural networks that shifts activation functions, or (2) systematic errors in training data or algorithms that lead to unfair or skewed predictions, often reflecting societal prejudices.

C #

Chain-of-Thought (CoT) Prompting

A prompting technique that encourages language models to break down complex reasoning into intermediate steps, significantly improving performance on mathematical, logical, and multi-step reasoning tasks.

Classification

A supervised learning task where the model predicts which category or class an input belongs to. Examples include spam detection (spam/not spam), image classification (cat/dog), and sentiment analysis (positive/negative).

Clustering

An unsupervised learning technique that groups similar data points together without predefined labels. Common algorithms include K-means, hierarchical clustering, and DBSCAN.

CNN (Convolutional Neural Network)

A type of neural network specialized for processing grid-like data such as images. CNNs use convolutional layers with learnable filters to detect features like edges, shapes, and textures hierarchically.

Computer Vision

A field of AI focused on enabling machines to interpret and understand visual information from images and videos, including tasks like object detection, image segmentation, and facial recognition.

Context Window

The maximum amount of text (measured in tokens) that a language model can process in a single input. Modern LLMs have context windows ranging from thousands to over a million tokens.

D #

Data Augmentation

Techniques for artificially increasing training data size by creating modified versions of existing data. For images, this includes rotations, flips, and color adjustments; for text, it includes paraphrasing and back-translation.

Deep Learning

A subset of machine learning using neural networks with multiple layers (deep neural networks) to learn hierarchical representations of data. Deep learning has driven breakthroughs in image recognition, NLP, and many other domains.

Diffusion Model

A type of generative model that learns to create data by reversing a gradual noising process. Diffusion models power state-of-the-art image generators like Stable Diffusion, DALL-E, and Midjourney.

Dropout

A regularization technique where random neurons are temporarily removed during training to prevent overfitting and improve generalization. Dropout helps neural networks become more robust.

E #

Embedding

A dense vector representation of data (like words, sentences, or images) in a continuous vector space where similar items are positioned close together. Embeddings enable ML models to understand relationships between entities.

Encoder-Decoder Architecture

A neural network design where an encoder processes input into a compressed representation, and a decoder generates output from that representation. Used in translation, summarization, and image captioning.

Epoch

One complete pass through the entire training dataset during model training. Training typically involves multiple epochs to allow the model to learn patterns effectively.

F #

Feature

An individual measurable property or characteristic of the data being observed. In machine learning, features are the input variables used by models to make predictions.

Few-Shot Learning

The ability of a model to learn new tasks from only a few examples. Few-shot prompting provides 2-5 examples to guide language models toward desired outputs.

Fine-Tuning

The process of taking a pre-trained model and further training it on a specific dataset or task. Fine-tuning adapts general-purpose models to specialized applications.

Foundation Model

Large AI models trained on broad data that can be adapted to many downstream tasks. Examples include GPT, BERT, and CLIP. Foundation models serve as the base for numerous applications.

G #

GAN (Generative Adversarial Network)

A framework where two neural networks compete: a generator creates fake data while a discriminator tries to distinguish fake from real. GANs have been influential in image generation and creative AI.

Generative AI

AI systems capable of creating new content, including text, images, audio, video, and code. Generative AI has transformed creative industries and knowledge work.

GPT (Generative Pre-trained Transformer)

A family of large language models developed by OpenAI based on the transformer architecture. GPT models are pre-trained on vast text corpora and fine-tuned for various applications.

Gradient Descent

An optimization algorithm used to minimize the loss function by iteratively adjusting model parameters in the direction of steepest descent. Variants include SGD, Adam, and RMSprop.

H #

Hallucination

When AI models generate false, misleading, or fabricated information presented as fact. Hallucinations are a significant challenge for language models that can confidently produce incorrect outputs.

Hyperparameter

Configuration settings external to the model that influence training, such as learning rate, batch size, and number of layers. Hyperparameter tuning is crucial for optimal model performance.

I #

Inference

The process of using a trained model to make predictions on new, unseen data. Inference speed and efficiency are critical for deploying AI in production systems.

In-Context Learning

The ability of large language models to learn tasks from examples provided in the prompt, without updating model weights. This enables flexible, dynamic task adaptation.

L #

Large Language Model (LLM)

Neural network models with billions of parameters trained on massive text datasets to understand and generate human language. LLMs like GPT, Claude, and Gemini power modern AI assistants.

Learning Rate

A hyperparameter that controls how much model weights are adjusted during each training iteration. Too high causes instability; too low slows learning.

Loss Function

A mathematical function that measures how well a model's predictions match the actual values. Training aims to minimize this function through optimization algorithms.

LSTM (Long Short-Term Memory)

A type of recurrent neural network designed to learn long-term dependencies through gating mechanisms that control information flow. LSTMs address the vanishing gradient problem in standard RNNs.

M #

Machine Learning (ML)

A subset of AI where systems learn patterns from data to make predictions or decisions without being explicitly programmed. ML encompasses supervised, unsupervised, and reinforcement learning.

Model

A mathematical representation learned from data that can make predictions or generate outputs. Models vary from simple linear regressions to complex neural networks with billions of parameters.

Multimodal

AI systems that can process and understand multiple types of data (modalities) such as text, images, audio, and video simultaneously. Modern LLMs increasingly support multimodal capabilities.

N #

Natural Language Processing (NLP)

A field of AI focused on enabling computers to understand, interpret, and generate human language. NLP powers chatbots, translation, sentiment analysis, and text summarization.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes (neurons) organized in layers that process information and learn patterns from data.

O #

Object Detection

A computer vision task that identifies and locates objects within images or video, drawing bounding boxes around detected items. YOLO and R-CNN are popular object detection architectures.

Overfitting

When a model learns the training data too well, including noise and outliers, resulting in poor performance on new data. Regularization techniques help prevent overfitting.

P #

Parameter

The internal variables of a model that are learned from training data, such as weights and biases in neural networks. Large language models have billions of parameters.

Pre-Training

Training a model on a large, general dataset before fine-tuning on specific tasks. Pre-training enables transfer learning and is fundamental to foundation models.

Prompt

The input text or instructions given to an AI model to guide its output. Effective prompt engineering is crucial for getting optimal results from language models.

R #

RAG (Retrieval-Augmented Generation)

A technique combining language models with external knowledge retrieval to provide more accurate, up-to-date, and verifiable responses by grounding generation in retrieved documents.

Regression

A supervised learning task where the model predicts continuous numerical values, such as house prices, temperature, or stock values.

Reinforcement Learning (RL)

A type of machine learning where an agent learns to make decisions by interacting with an environment and receiving rewards or penalties for its actions.

RNN (Recurrent Neural Network)

A neural network architecture designed for sequential data, where connections between nodes form directed cycles, allowing information to persist across time steps.

S #

Self-Supervised Learning

A learning paradigm where the model generates its own labels from the input data, such as predicting masked words in a sentence. This enables training on vast unlabeled datasets.

Semantic Search

Search that understands the meaning and intent behind queries rather than just matching keywords. Powered by embeddings and vector similarity.

Supervised Learning

A type of machine learning where the model learns from labeled examples, mapping inputs to known outputs. Classification and regression are common supervised learning tasks.

T #

Token

The basic unit of text that language models process, typically representing a word, part of a word, or punctuation. Tokenization breaks text into these units for model consumption.

Training

The process of teaching a model by exposing it to data and adjusting its parameters to minimize prediction errors. Training requires data, compute resources, and optimization algorithms.

Transfer Learning

Applying knowledge learned from one task or domain to a different but related task. Transfer learning enables efficient training with less data and compute.

Transformer

A neural network architecture based on self-attention mechanisms that has become the foundation of modern NLP and increasingly other domains. Transformers enable parallel processing and capture long-range dependencies.

U #

Unsupervised Learning

A type of machine learning where the model finds patterns in data without labeled examples. Clustering and dimensionality reduction are common unsupervised tasks.

V #

Vector Database

A database optimized for storing and querying high-dimensional vectors (embeddings). Vector databases enable efficient similarity search for RAG, recommendation systems, and semantic search.

W #

Weight

A numerical parameter in neural networks that determines the strength of connections between neurons. Weights are adjusted during training to optimize model performance.

Z #

Zero-Shot Learning

The ability of a model to perform tasks it wasn't explicitly trained on, without any examples. Large language models demonstrate impressive zero-shot capabilities across many tasks.

Expanding Your AI Vocabulary

This glossary covers essential AI terminology. As the field evolves rapidly, new terms and concepts emerge frequently. Explore our topic pages for deeper understanding of each concept in context.