• A* algorithm: a method to find the shortest path in a graph with an additional heuristic function.
  • A3C (Asynchronous Advantage Actor-Critic): A method for solving reinforcement learning problems that is able to work with multiple parallel agents and update the model asynchronously.
  • Action: The choices available to the agent at a given state.
  • Actor-Critic: A method that uses two separate neural networks: one to represent the policy (actor) and one to represent the value function (critic).
  • AdaBoost : a method that combines multiple weak models, such as decision stumps, to create a stronger model.
  • Adadelta: A variant of gradient descent that adapts the learning rate for each parameter based on the historical gradient information and is designed to be robust to the scale of the parameters.
  • Adadelta: An optimization algorithm that adapts the learning rate for each parameter based on the historical gradient information and is designed to be robust to the scale of the parameters.
  • Adagrad: A variant of gradient descent that adapts the learning rate for each parameter based on the historical gradient information and performs well on problems with sparse gradients.
  • Adagrad: An optimization algorithm that adapts the learning rate for each parameter based on the historical gradient information and performs well on problems with sparse gradients.
  • Adam: A variant of gradient descent that uses gradient descent and adapts the learning rate for each parameter based on the historical gradient information.
  • Adam: An optimization algorithm that uses gradient descent and adapts the learning rate for each parameter based on the historical gradient information.
  • Adversarial Attack: a method that generates adversarial examples by modifying the input in a way that is intended to fool the model, but is difficult to detect by humans.
  • Adversarial Defense: a method that defends against adversarial attacks by identifying and rejecting adversarial examples or by making the model more robust to them.
  • Adversarial Training: a method that trains a model by exposing it to adversarial examples, which are examples that are carefully crafted to fool the model. This improves the robustness of the model against malicious attacks and input perturbations.
  • Affinity Propagation: A method that groups the data into clusters by sending “messages” between the data points, where each message represents the suitability of one data point as the exemplar of a cluster.
  • Agent: The entity that takes actions in an environment.
  • Algorithm: a set of instructions for completing a task, often used in computer programming and problem-solving.
  • Alternating Direction Method of Multipliers (ADMM): A method that uses the ADMM algorithm to solve optimization problems that are hard to solve directly by decomposing them into smaller subproblems that can be solved independently.
  • AMSGrad: A variant of Adam that uses the maximum of past squared gradient to scale the learning rate, which leads to a more robust optimization.
  • Annealing: A method that gradually decreases the temperature in a simulated annealing algorithm to find the global minimum of a function.
  • Anomaly Detection: a method that detects unusual or abnormal patterns in data that deviate from the normal behavior.
  • Anomaly Detection: A method that identifies data points that are unusual or different from the others.
  • Approximation algorithm: an algorithm that finds a solution that is close to the optimal solution, but not necessarily the best.
  • Apriori : a method for finding frequent item sets in a database of transactions, used in market basket analysis and association rule learning.
  • Asymptotic analysis: the study of how the running time of an algorithm changes as the input size becomes arbitrarily large.
  • Attention Mechanism: a method used in neural networks to weigh different inputs, parts of the inputs or parts of the intermediate representations of the inputs for a task.
  • Attention mechanism: a technique used in neural networks to selectively focus on certain parts of the input while processing it, used in tasks such as machine translation and image captioning.
  • Autoencoder based: A method that trains an autoencoder to reconstruct the normal data, and then use the reconstruction error as a measure of anomaly.
  • Autoencoder: a method that learns a compact representation of the data by training an encoder to map the data to a lower-dimensional space and a decoder to reconstruct the data from this lower-dimensional space.
  • Autoencoder: A neural network that learns a compact representation of the input data by training a encoder to map the input to a lower-dimensional representation and a decoder to map the representation back to the original data space.
  • Autoencoder: a type of neural network used for dimensionality reduction and feature learning, characterized by the use of an encoder and a decoder.
  • Autoregressive Models: a class of models that generate output by predicting the next value in a sequence based on the previous values. Examples include the ARIMA and GARCH models in time series analysis, and the language modeling in NLP tasks.
  • Backpropagation : a method for training a neural network by calculating the gradient of the error with respect to the weights using the chain rule.
  • Backtracking: a technique for finding all possible solutions to a problem by trying different options and undoing them if they don’t lead to a solution.
  • Bagging (Bootstrap Aggregating): A method that generates multiple models by training them on different bootstrap samples of the data, and aggregates their predictions by taking the average or majority vote.
  • Bagging: A method that trains multiple models independently on different subsets of the data and then averages their predictions to improve the performance.
  • Barrier: A method that uses a barrier algorithm to find the optimal solution of a convex optimization problem.
  • Batch Gradient Descent: A variant of gradient descent that uses the full dataset to compute the gradients at each iteration.
  • Batch Normalization: A method that normalizes the inputs to a layer in a neural network by subtracting the mean and dividing by the standard deviation, in order to stabilize the training process and prevent overfitting.
  • Batch normalization: a technique used to normalize the activations of a network, used to stabilize the training process and improve the generalization of the model.
  • Bayesian Inference: The process of using Bayes’ theorem to update the probability of a hypothesis as more data is collected.
  • Bayesian Networks: A graphical model that represents the relationships between random variables and the conditional dependencies between them using a directed acyclic graph.
  • Bayesian Optimization: A method of hyperparameter tuning that uses Bayesian techniques to model the uncertainty in the performance of a model as a function of the hyperparameters, and uses this model to select the next set of hyperparameters to evaluate.
  • Bayesian Optimization: A method that uses Bayesian methods to model the uncertainty of the performance of a model with respect to its hyperparameters, and to select the next set of hyperparameters to evaluate.
  • Bayesian Statistics: A branch of statistics that uses Bayes’ theorem to update the probability of a hypothesis as more data is collected.
  • Bellman-Ford algorithm: a method to find the shortest path in a graph with negative edge weights.
  • BERT (Bidirectional Encoder Representations from Transformers): a transformer-based neural network pre-trained on a large corpus of text, optimized for natural language understanding tasks such as question answering and sentiment analysis.
  • BERT (Bidirectional Encoder Representations from Transformers): a type of transformer-based neural network pre-trained on a large corpus of text, used for natural language understanding tasks such as question answering and sentiment analysis.
  • BERT-of-Theseus: a method that continually updates a pre-trained model by replacing small parts of it with newly trained components, allowing for continual learning and adaptation to new tasks or data distribution.
  • BERTweet (BERT for Twitter): A pre-trained model that is fine-tuned for understanding the language of twitter and is designed to handle twitter-specific linguistic patterns and variations.
  • BFS (Breadth-first search): a method to explore a graph by visiting all the vertices at a given depth before moving on to the next depth level.
  • Big O notation: a way to express the upper bound of an algorithm’s time complexity, usually used to describe the worst-case scenario.
  • Big omega notation: a way to express the lower bound of an algorithm’s time complexity, usually used to describe the best-case scenario.
  • Big theta notation: a way to express the average case of an algorithm’s time complexity.
  • Bipartite Graph: A graph where the nodes can be divided into two sets such that there are no edges between nodes within the same set.
  • Birch: A method that is used for hierarchical clustering and builds a tree-based representation of the density of points in the feature space.
  • Boltzmann machine : a type of stochastic recurrent neural network, used for generative modeling and dimensionality reduction.
  • Boosting: A method that generates multiple models by training them on different subsets of the data, and aggregates their predictions by giving more weight to the models that perform better.
  • Boosting: A method that trains multiple models sequentially, where each model tries to correct the mistakes of the previous model.
  • Bootstrap: A method that generates new samples from the original data by randomly selecting data points with replacement.
  • Branch and bound: a technique that combines both backtracking and branch and bound in a branch and bound tree to find a solution.
  • Brute force: a straightforward and simple approach to solving a problem by trying all possible solutions.
  • by evaluating the expected future rewards of each state or action.
  • Capsule Networks: a type of neural network that uses “capsules” as the basic building block to model hierarchical relationships in data such as images and text.
  • CatBoost: A gradient boosting framework that is designed to work well with categorical features.
  • CatBoost: A method that uses an ensemble of decision trees like XGBoost and LightGBM, but with additional optimizations for categorical features.
  • CG(Conjugate Gradient): A method that uses the conjugate gradient algorithm to optimize the parameters.
  • Chromosome: The genetic representation of an individual in the population.
  • Clustering: A method that groups similar data points together based on a similarity measure.
  • Clustering: a method that groups similar data points together into clusters based on their features or similarity measures.
  • Complete Graph: A graph where there is an edge between any two nodes.
  • Conjugate Prior: A prior probability distribution that belongs to the same family as the likelihood function, which makes it easy to calculate the posterior probability.
  • Connected Graph: A graph where there is a path between any two nodes.
  • Convolutional neural network: a type of neural network commonly used in image processing and computer vision, characterized by the use of convolutional layers.
  • Convolutional Neural Networks (CNNs): a type of neural network that is particularly well suited to image and signal processing tasks. It uses convolutional layers to automatically and adaptively learn spatial hierarchies of features from input data.
  • Coordinate Descent: A method that optimizes the parameters of a model by iteratively updating one parameter at a time, while keeping the other parameters fixed.
  • Cross-Entropy Method: A method that uses the concept of cross-entropy to estimate the optimal parameters of a probability distribution.
  • Crossover: The process of combining the genetic material of two individuals to create a new offspring.
  • Cross-validation: A method that divides the data into several subsets called folds, and uses one fold as the validation set while training the model on the remaining folds, in order to evaluate the model’s performance and avoid overfitting.
  • Cross-validation: a technique used to evaluate the performance of a model by training it on different subsets of the data and testing it on the remaining data.
  • CTRL (Conditional Transformer Language Model): a type of transformer-based neural network that generates text conditioned on given context, such as a prompt or a set of labels.
  • Cutout: A regularization technique that randomly masks out rectangular regions of the input images during training.
  • DAgger (Dataset Aggregation): a method that iteratively trains a model with real-world data and expert feedback to improve its performance on a task.
  • Data Augmentation: A method that generates new training samples by applying various transformations to the existing samples, such as rotation, flipping, or cropping.
  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise): A method that groups the data into clusters based on density, by defining clusters as regions of high density separated by regions of low density.
  • DDPG (Deep Deterministic Policy Gradients): A method for solving continuous control tasks in reinforcement learning by using a deep neural network to represent the actor and the critic.
  • Decision tree: a method for classification and regression that uses a tree structure to make predictions.
  • Deep Belief Networks (DBN): A method that uses unsupervised pre-training followed by supervised fine-tuning to detect anomalies.
  • Deep Learning: A subfield of machine learning that uses deep neural networks with multiple layers to learn complex patterns and representations of data.
  • Deep learning: a type of machine learning that uses neural networks with multiple layers to analyze data.
  • Dependency parsing: The process of analyzing the grammatical structure of a sentence, and identifying the relationships between the words, such as subject, object, modifier, etc.
  • DFS (Depth-first
  • Differentiable Architecture Search (DARTS): A method that uses gradient-based optimization to search for the best architecture of a neural network.
  • Dijkstra’s algorithm: a method to find the shortest path in a graph with non-negative edge weights.
  • Dimensionality Reduction: a method that reduces the number of features or dimensions of the data while preserving the important information.
  • Directed Graph: A graph where the edges have a direction, and the edges are ordered pairs of nodes (u,v) where u is the source and v is the destination.
  • Divide and conquer: a technique for solving problems by breaking them down into smaller subproblems and solving each subproblem individually.
  • DQN (Deep Q-Network): A method for solving reinforcement learning problems by using a deep neural network to approximate the Q-function.
  • DropConnect : A regularization technique that drops out individual weights in a neural network instead of entire neurons.
  • Dropout: A method that randomly drops out a certain percentage of the neurons in a neural network during training, in order to prevent overfitting by making the model more robust to the specific weights of each neuron.
  • Dropout: a regularization technique used to prevent overfitting in neural networks by randomly dropping out neurons during training.
  • Dual Decomposition: A method that uses dual decomposition algorithm to solve optimization problems that are hard to solve directly by decomposing them into smaller subproblems that can be solved independently.
  • Dual Encoder : A type of model for text similarity and text entailment tasks, composed of two encoders: one for the premise and one for the hypothesis.
  • Dynamic programming: a technique for solving complex problems by breaking them down into simpler subproblems and storing their solutions to avoid redundant work.
  • Early Stopping: A method that stops training a model when the performance on a validation set stops improving, in order to prevent overfitting by avoiding training for too many iterations.
  • Early stopping: A regularization technique that stops training a model when the performance on a validation set stops improving.
  • EBM (Energy-Based Model): a type of generative model that assigns a scalar energy value to each sample and learns to generate samples with low energy values.
  • Edge: A connection between two nodes in a graph that can represent a relationship or a similarity between the connected objects or entities.
  • Eigenvalue and eigenvector: a mathematical concept used in linear algebra and in many other fields, such as physics, engineering, and computer science.
  • Embedding: a technique used to represent discrete data in a continuous vector space, commonly used in natural language processing and computer vision tasks.
  • Ensemble Learning: A method that combines the predictions of multiple models to improve the overall performance of the model.
  • Ensemble learning: a technique that combines multiple models to improve the overall performance.
  • Environment: The system or scenario in which the agent operates.
  • Evolutionary Algorithm: A method of hyperparameter tuning that uses techniques from evolutionary computation, such as genetic algorithms or particle swarm optimization, to search for the optimal set of hyperparameters.
  • Evolutionary Algorithm: A method that uses evolutionary computations such as genetic algorithms, evolution strategies, and genetic programming to optimize the parameters of a model or to find solutions to problems.
  • Evolutionary Computation: A field of study that focuses on the use of techniques inspired by natural evolution, such as genetic algorithms and genetic programming, to optimize complex systems and solve problems.
  • Expectation-Maximization (EM) : a method for finding maximum likelihood estimates of parameters in statistical models, where the model depends on unobserved latent variables. -PageRank : a method for ranking web pages in order of importance, used by Google’s search engine.
  • Expectation-Maximization (EM): a method that finds the maximum likelihood estimates of the parameters of a probabilistic model by iteratively estimating the latent variables and the parameters based on the observed data.
  • -exploitation trade-off by balancing the exploration of new options with the exploitation of the best current option, based on the upper confidence bound of the performance of each option.
  • Factor Analysis: A method that finds latent variables or factors that explain the observed variables.
  • Few-shot Learning: A method that allows a model to learn a new task or class from a small number of examples.
  • Fine-tuning: a method of training a pre-trained neural network on a specific task using a small amount of labeled data.
  • Fitness: The measure of the quality or performance of an individual in the population.
  • Flow-based models: a type of generative model that uses normalizing flows to transform a simple base distribution into a complex target distribution.
  • Floyd-Warshall algorithm: a method to find the shortest paths between all pairs of vertices in a graph.
  • GAN (Generative Adversarial Network): a method that consists of two neural networks: a generator that generates new data samples, and a discriminator that distinguishes the generated samples from real samples. The generator and discriminator are trained in an adversarial manner to improve the quality of the generated samples.
  • GAN (Generative Adversarial Network): a type of generative model that consists of two neural networks: a generator that generates new samples, and a discriminator that tries to distinguish the generated samples from the true samples.
  • GAN (Generative Adversarial Network): a type of neural network composed of two sub-networks, a generator and a discriminator, used for generative modeling.
  • Gaussian Mixture Model (GMM): a method that models the data as a mixture of Gaussian distributions, and finds the maximum likelihood estimates of the parameters of these distributions and the latent variables that indicate the assignment of each data point to a particular distribution.
  • Gene: A unit of the chromosome that encodes a specific aspect of the solution.
  • Generative models for speech synthesis: a type of neural network that generates speech from text or other forms of input, such as a speaker’s voice or an image.
  • Generative Models: a class of models that generate new data samples from some underlying probability distribution. Examples include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Normalizing Flow-based models (NFs).
  • Generative Pre-training Transformer (GPT): a type of transformer-based neural network pre-trained on a large corpus of text, used for natural language processing tasks such as language translation, text summarization, and text completion.
  • Genetic Algorithm: A method that mimics the process of natural selection to find the optimal solution to a problem.
  • Genetic algorithm: an algorithm that mimics the process of natural selection to find a solution.
  • Gibbs Sampling: A method that generates samples from a probability distribution by sampling from the full conditional distributions of each variable, given the current values of the other variables.
  • GMM (Gaussian Mixture Model): A method that models each cluster as a Gaussian distribution and fits the parameters of these distributions to the data.
  • GNN (Graph Neural Network) : a type of neural network designed to process graph-structured data such as social networks, molecular graphs, and road networks.
  • GPT (Generative Pre-training Transformer): a type of transformer-based neural network pre-trained on a large corpus of text, optimized for natural language understanding tasks such as question answering and sentiment analysis, and can also generate text.
  • GPT-2 (Generative Pre-training Transformer 2): a more advanced version of GPT, pre-trained on an even larger corpus of text, with improved performance on various natural language understanding and generation tasks.
  • GPT-3( Generative Pre-training Transformer 3): a type of transformer-based neural network pre-trained on a very large corpus of text, used for natural language processing tasks such as language translation, text summarization, and text completion.
  • Gradient Boosting: a method that combines multiple weak models, such as decision trees, to create a stronger model.
  • Gradient Descent: A method for optimizing the parameters of a model by updating the parameters in the direction of the negative gradient of the loss function with respect to the parameters.
  • Gradient descent: an optimization algorithm used to find the minimum of a function by iteratively moving in the direction of the negative gradient.
  • Graph Attention Networks (GATs): a method that extends graph convolutional networks by introducing attention mechanisms to weight the importance of different edges and vertices in the graph.
  • Graph Convolutional Networks (GCNs): a method that extends traditional convolutional neural networks to graph-structured data by defining a convolution operation on the graph.
  • Graph Isomorphism Networks (GINs): a method that uses a neural network to distinguish between non-isomorphic graphs, which are graphs with different structures but the same node labels
  • Graph Theory: A field of mathematics that studies the properties of graphs, which are structures made up of nodes and edges.
  • Graph: A data structure that consists of a set of nodes and a set of edges connecting them.
  • GraphSAGE (Graph Convolutional Networks for Inductive Learning on Large Graphs): a method that uses a graph convolutional network to learn a function that can generalize to new vertices and edges in the graph.
  • Greedy algorithm: a type of algorithm that makes locally optimal choices at each step in the hope of finding a global optimum.
  • Grid Search: A method of hyperparameter tuning that involves specifying a set of possible values for each hyperparameter, and training a model with all possible combinations of these values.
  • Grid Search: A method that evaluates a model for a range of hyperparameters by specifying a set of values for each hyperparameter and training a model for each combination of the values.
  • GRU (Gated Recurrent Unit): a type of RNN that uses gating mechanisms to control the flow of information in the hidden state, similar to LSTMs but with a simpler structure.
  • Hamiltonian Monte Carlo (HMC): A method that generates samples from a probability distribution by using gradient information to construct a Markov Chain that has the desired distribution as its equilibrium distribution.
  • Heuristic: an algorithm that attempts to find a solution that is good enough, rather than an optimal solution.
  • Hidden Markov Model (HMM): a method that models the data as a sequence of hidden states and observed emissions, and finds the maximum likelihood estimates of the parameters of the model and the hidden states given the observed data.
  • Hierarchical Clustering: A method that groups the data into a hierarchy of clusters by merging or splitting clusters based on a linkage criterion.
  • Huffman coding: a method for lossless data compression.
  • Hybrid Methods: A combination of the above-mentioned methods such as Grid Search with Random Search, Random Forest with XGBoost etc.
  • Hyperband: A method of hyperparameter tuning that uses an iterative process of training models with different hyperparameter configurations and different resource budgets, and selecting the best models at each stage to continue training with more resources.
  • Hyperopt: A python library that enables to perform hyperparameter optimization using various search algorithms such as Random Search, Tree of Parzen Estimators (TPE) and other bayesian optimization methods.
  • Hyperparameter Optimization: A method that uses search algorithms to find the best set of hyperparameters for a model.
  • Hyperparameter tuning: a technique used to find the best set of hyperparameters for a model.
  • Hyperparameter tuning: The process of tuning the parameters of a model that are not learned during training, such as the learning rate or the number of hidden layers in a neural network.
  • Image captioning: a technique used in computer vision to generate a natural language description of an image.
  • Imitation Learning: a method that trains a model to mimic the behavior of an expert by learning from demonstration data.
  • Importance Sampling: A method that estimates the expectations of a function with respect to a probability distribution by weighting the function’s values at different points by the probability of sampling those points from the distribution.
  • Independent Component Analysis (ICA): A method that finds independent components of the data, which are linear combinations of the original features that are as independent as possible.
  • Interior Point: A method that uses an interior point algorithm to find the optimal solution of a convex optimization problem.
  • Isolation Forest: A method that isolates individual observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.
  • Iterative: a type of algorithm that repeatedly performs a set of instructions to solve a problem.
  • K-Fold Cross-validation: A specific type of cross-validation where the data is divided into k folds and the model is trained and validated k times, each time using a different fold as the validation set.
  • K-means: a method for clustering data into k groups based on the means of the data in each cluster.
  • k-Means: a method that clusters the data into k clusters by iteratively assigning each data point to the cluster with the closest centroid and updating the centroids based on the data points assigned to them.
  • K-means: A method that groups the data into k clusters by iteratively moving the cluster centroids to the mean of the data points in the cluster.
  • K-nearest neighbors: a method for classification by finding the k closest data points to a given test point and returning the most common class among them.
  • Kruskal’s algorithm: a method to find a minimum spanning tree in a graph.
  • L1 Regularization: A method that adds a penalty term to the loss function proportional to the absolute value of the parameters.
  • L1-L2 Regularization: A method that adds both L1 and L2 penalty terms to the loss function.
  • L2 Regularization: A method that adds a penalty term to the loss function proportional to the square of the parameters.
  • Label smoothing: A regularization technique that modifies the one-hot encoded labels by replacing the 1’s with a small value close to 1 to encourage the model to produce less confident predictions.
  • Language model: a model that is trained to predict the next word in a sequence of words, used in natural language processing tasks such as text generation, text completion, and machine translation.
  • L-BFGS: A quasi-Newton method, an optimization algorithm that approximates the second derivative of the function to be optimized.
  • Leave-One-Out Cross-validation: A specific type of cross-validation where the model is trained on all but one data point, and validated on the left out data point. This method is used when the data size is small.
  • Lemmatization: The process of reducing words to their base form, so that words that have the same base form are recognized as the same word.
  • LightGBM: A gradient boosting framework that uses tree-based learning algorithms and is designed to be efficient and scalable.
  • LightGBM: A method similar to XGBoost but with additional optimizations such as histogram-based decision tree learning and leaf-wise tree growth.
  • Likelihood: The probability of the data given the hypothesis.
  • Local Outlier Factor (LOF): A method that measures the local density of a data point compared to its neighbors and identifies data points that have a lower density than their neighbors as outliers.
  • LSTM (Long Short-Term Memory): a type of recurrent neural network, characterized by the use of memory cells to maintain information across
  • LSTM (Long Short-Term Memory): a type of RNN that uses gating mechanisms to control the flow of information in the hidden state, allowing it to maintain information for longer periods of time and to avoid the vanishing gradient problem.
  • Machine Learning: A field of study that focuses on the development of algorithms and models that can learn from data and make predictions or decisions without being explicitly programmed.
  • Mahalanobis Distance: A method that calculates the distance of a data point from the mean of the data in multidimensional space while taking into account the covariance of the data.
  • Markov Chain Monte Carlo (MCMC): a class of algorithms for sampling from a multivariate distribution using Markov Chain.
  • Markov Chain Monte Carlo (MCMC): A method that generates samples from a probability distribution by constructing a Markov Chain that has the desired distribution as its equilibrium distribution, and then generating samples from the chain.
  • Matrix factorization: a method for decomposing a matrix into the product of two or more matrices.
  • MCMC (Markov Chain Monte Carlo): a class of algorithms for approximating complex distributions by generating samples from the distribution using a Markov Chain.
  • Metaheuristic: an algorithm that guides other heuristics to find a solution.
  • Meta-learning: a method that trains a model to learn how to learn, by learning from a distribution of tasks and adapting its parameters quickly to new tasks with limited data.
  • Meta-RL (Meta-Reinforcement Learning): a method that uses reinforcement learning to learn how to learn, by learning from a distribution of tasks and adapting its parameters quickly to new tasks with limited data.
  • Metropolis-Hastings: A method that generates samples from a probability distribution by proposing a new sample from a proposal distribution and accepting or rejecting it based on the ratio of the target distribution to the proposal distribution.
  • Mini-batch Gradient Descent: A variant of stochastic gradient descent that uses a small batch of the data (or a small subset of the parameters) to estimate the gradient at each step instead of using a single data point.
  • Mixup: A regularization technique that trains the model on virtual examples created by interpolating the features and labels of the original examples.
  • Momentum: A variant of gradient descent that uses a moving average of the previous gradients to add a “momentum” term to the update, which can help the optimization converge more quickly and avoid getting stuck in local minima.
  • Monte Carlo (MC) Learning: A method that estimates the value of a state or state-action pair by averaging the rewards obtained after visiting it.
  • Monte Carlo Sampling: A method that generates samples from a probability distribution by randomly sampling from it.
  • Monte Carlo Tree Search (MCTS): a method that uses random sampling to explore possible future moves and select the best move in a game or decision-making scenario.
  • Multi-armed Bandit: A method for solving the exploration-exploitation trade-off by balancing the exploration of new options and the exploitation of the best current option.
  • Multi-task Learning: A method of training a model to perform multiple tasks simultaneously, in order to improve the performance of all tasks or transfer knowledge between tasks.
  • Mutation: The process of making small random changes to the genetic material of an individual.
  • Naive Bayes: a method for classification based on Bayes’ theorem with the assumption that all features are independent.
  • Named Entity Recognition (NER): The process of identifying and classifying named entities in text, such as people, organizations, locations, etc.
  • Natural Language Processing (NLP): A field of study that focuses on the interaction between computers and human languages, and encompasses a wide range of techniques for processing and analyzing text and speech data.
  • Nesterov Momentum: A variant of gradient descent that uses the gradient of the future position in the optimization trajectory, rather than the current position, to calculate the momentum term. This can help the optimization converge more quickly.
  • Neural Architecture Search (NAS): A method that uses a search algorithm to find the best architecture for a neural network on a specific task.
  • Neural Architecture Search (NAS): A method that uses machine learning techniques to automatically search for the best architecture of a neural network for a specific task.
  • Neural Machine Translation (NMT) : a type of model based on neural networks that is used for machine translation
  • Neural network: an algorithm that is inspired by the structure and function of the human brain, used to recognize patterns in data and make predictions.
  • Neural ODE (Ordinary Differential Equations): a method that uses continuous dynamics to model the evolution of the hidden states in a neural network, instead of discrete layers. This allows for more efficient computation and the ability to model complex, highly structured data.
  • Newton-CG: A method that uses the Newton conjugate gradient algorithm to optimize the parameters.
  • Newton’s method: A method that uses the Newton-Raphson algorithm, a root-finding method that uses the first and second derivatives of the function to be optimized.
  • NF (Normalizing Flow): a method that consists of a series of invertible functions that transform a simple prior distribution into a complex target distribution. The model learns the parameters of these functions through maximum likelihood estimation.
  • NLP-Transformer: a method specifically designed for natural language processing tasks, characterized by the use of self-attention mechanisms, and pre-training on a large corpus of text.
  • Node: A unit of a graph that can represent an object or an entity.
  • Non-negative Matrix Factorization (NMF): A method that factorizes a non-negative matrix into the product of two non-negative matrices.
  • Object detection: a technique used in computer vision to locate and classify objects in an image.
  • One-class SVM: A method that learns a decision boundary that separates the data points from a target class from the rest of the feature space.
  • One-shot Learning: A method that allows a model to learn a new task or class from a single example.
  • Optuna: A python library that enables to perform hyperparameter optimization using a variety of search algorithms.
  • Part-of-speech (POS) tagging: The process of labeling each word in a sentence with its corresponding part of speech, such as noun, verb, adjective, etc.
  • Policy: The strategy or rule that the agent uses to choose actions.
  • Population: The set of individuals that are evolved by the algorithm.
  • Posterior Probability: The probability of a hypothesis after collecting data, which is calculated by multiplying the prior probability by the likelihood and normalizing by the marginal likelihood.
  • PPO (Proximal Policy Optimization): A method for improving the stability and performance of policy gradient methods in reinforcement learning.
  • PPO (Proximal Policy Optimization): A method that uses a trust region constraint to ensure that the new policy is not too different from the old one and also uses a value function to estimate the advantage of each action.
  • Pre-training: a method of training a neural network on a large dataset before fine-tuning it on a specific task.
  • Prim’s algorithm: a method to find a minimum spanning tree in a graph.
  • Principal Component Analysis (PCA): a method that finds the principal components of the data, which are the directions of maximum variance, and projects the data onto these directions.
  • Principal component analysis: a method for dimensionality reduction by finding the principal components of a dataset.
  • Prior Probability: The probability of a hypothesis before collecting any data.
  • Q-Learning: A method for solving reinforcement learning problems by estimating the optimal action-value function.
  • Q-Learning: A method that estimates the value of each state-action pair and uses it to choose actions that maximize the expected cumulative reward.
  • Q-learning: a method to solve a reinforcement learning problem by finding the optimal action-value function.
  • Random Forest: A method that uses an ensemble of decision trees to make predictions. Each tree is trained on a random subset of the data and features, and the final prediction is made by averaging the predictions of all the trees.
  • Random forest: an ensemble method that combines multiple decision trees to make predictions.
  • Random Sampling: A method that generates new samples from the original data by randomly selecting data points without replacement.
  • Random Search: A method of hyperparameter tuning that involves specifying a probability distribution for each hyperparameter, and sampling random combinations of these values to train models.
  • Random Search: A method that evaluates a model for a range of hyperparameters by sampling random values for each hyperparameter and training a model for each set of values.
  • Random Subsampling: A method that generates new samples from the original data by randomly selecting a subset of the data.
  • Randomized algorithm: an algorithm that uses randomness to solve a problem.
  • Recurrent neural network: a type of neural network commonly used in natural language processing, characterized by the use of recurrent layers.
  • Recurrent Neural Networks (RNNs): a type of neural network that is well suited to sequential data such as time series, speech, and text. It uses recurrence to maintain a hidden state that can capture information about the past, and uses this hidden state to process the current input and produce the current output.
  • Recursive: a type of algorithm that calls itself to solve a problem.
  • Regularization: A method that adds a penalty term to the loss function to prevent overfitting by shrinking the parameters towards zero or constraining them to be small.
  • Regularization: a technique used to prevent overfitting by adding a term to the loss function that discourages certain behaviors of the model.
  • Reinforcement Learning (RL): A type of machine learning that focuses on training agents to make decisions by interacting with an environment and receiving rewards or penalties for their actions.
  • Reinforcement Learning: a method of training models that learn to take actions in an environment in order to maximize a reward signal.
  • Reinforcement learning: a type of machine learning where an agent learns to make decisions by interacting with an environment and receiving rewards or penalties.
  • Reinforcement Learning: A type of machine learning where the model learns from the consequences of its actions in an environment, through trial and error.
  • Reward: The feedback signal provided by the environment to the agent, indicating how good or bad its actions are.
  • RMS prop: A variant of gradient descent that uses gradient descent and adapts the learning rate for each parameter based on the historical gradient information and the scale of the parameters.
  • RMSprop: An optimization algorithm that uses gradient descent and adapts the learning rate for each parameter based on the historical gradient information and the scale of the parameters.
  • RNN (Recurrent Neural Network): a type of neural network characterized by the use of recurrent layers which allow information to be passed from one step of the network to the next.
  • RNN-Transducer: a type of neural network that combines the power of RNNs and Transducers for sequence-to-sequence tasks such as machine translation and speech recognition.
  • RoBERTa (Robustly Optimized BERT Pre-training): a type of transformer-based neural network pre-trained on a large corpus of text, optimized for natural language understanding tasks such as question answering and sentiment analysis.
  • Ron Legarski: Algorithm Specialist and Telecommunications Agent.
  • SARSA (State-Action-Reward-State-Action): A method that estimates the value of each state-action pair and uses it to choose actions that maximize the expected cumulative reward while taking into account the next action.
  • SARSA (State-Action-Reward-State-Action): A variant of Q-Learning that uses the next action selected in the environment instead of the optimal one.
  • search): a method to explore a graph by visiting a vertex and then recursively visiting all its unvisited adjacent vertices.
  • Selection: The process of choosing individuals from the population to participate in the next generation based on their fitness.
  • Self-Attention: a type of attention mechanism that allows a model to weigh different parts of its input or intermediate representations based on their relationships with each other.
  • Self-Organizing Map (SOM): a method that projects the data into a low-dimensional grid of nodes and learns a topology-preserving map of the data based on competitive learning.
  • Semantic segmentation: a technique used in computer vision to classify each pixel in an image into a predefined set of classes.
  • Semi-supervised Learning: A type of machine learning where the model is trained on a small amount of labeled data and a large amount of unlabeled data.
  • Sentiment Analysis: The process of determining the sentiment or emotion expressed in text, such as positive, negative or neutral.
  • Seq2Seq (Sequence to Sequence): a type of model used for tasks such as machine translation and text summarization, which maps an input sequence to an output sequence.
  • Simplex algorithm: a method to solve linear programming problems.
  • Simplex: A method that uses the simplex algorithm to find the optimal solution of a linear programming problem.
  • Singular Value Decomposition (SVD): A method that factorizes a matrix into the product of three matrices, one of which contains the principal components of the data.
  • Singular value decomposition: a method for factorizing a matrix into the product of three matrices.
  • Space complexity: the amount of memory an algorithm uses as a function of the size of the input.
  • Spectral Clustering: A method that finds clusters by constructing a similarity matrix from the data and finding the eigenvectors of this matrix that correspond to the largest eigenvalues.
  • Stacking: A method that trains multiple models independently and then uses their predictions as input to another model that makes the final predictions.
  • State: The representation of the current condition of the environment.
  • Stemming: The process of reducing words to their base form, so that words that have the same base form are recognized as the same word.
  • Stochastic Gradient Descent (SGD): A method for optimizing the parameters of a model by updating the parameters in the direction of the negative gradient of the loss function with respect to the parameters, using a randomly selected subset of the data (or “batch”) at each step.
  • Stochastic Gradient Descent (SGD): A variant of gradient descent that uses a random subsample of the data (or a random subset of the parameters) to estimate the gradient at each step instead of using the full data.
  • Stochastic Weight Averaging (SWA): A method for improving generalization in deep learning by averaging the weights of multiple models trained with SGD.
  • Stratified Cross-validation: A specific type of cross-validation that ensures that each fold has a representative ratio of each class in the data.
  • Supervised Learning: A type of machine learning where the model is trained on labeled data, where the correct output is provided for each input.
  • Support vector machine: a method for classification and regression that finds the best boundary between different classes in a high-dimensional space.
  • T5 (Text-to-Text Transfer Transformer): a type of transformer-based neural network that can be fine-tuned on a wide range of natural language processing tasks by learning to map a given input to a specific output format.
  • TD Learning : A method that learns the value function of a policy by evaluating the expected future rewards of each state or action.
  • TD Learning: a method that learns the value function of a policy
  • Temporal-Difference (TD) Learning: A method that estimates the value of a state or state-action pair by updating the estimate based on the difference between the current estimate and the estimate of the next state or state-action pair.
  • Thompson Sampling: A method for solving the exploration-exploitation trade-off by sampling from the posterior distribution of the performance of each option.
  • Time complexity classes: common ways of classifying the time complexity of algorithms, such as O(1), O(n), O(log n), and O(n^2).
  • Time complexity: the amount of time it takes for an algorithm to run as a function of the size of the input.
  • timesteps, commonly used in natural language processing and speech recognition.
  • Tokenization: The process of splitting text into individual words, phrases, or symbols.
  • Topic Modeling: The process of identifying the main topics or themes in a collection of text documents.
  • Topological sorting: a method to order the vertices in a directed acyclic graph such that for every directed edge uv, vertex u comes before v in the ordering.
  • Transfer Learning: A method of using pre-trained models on a related task as a starting point for training a model on a new task.
  • Transfer Learning: A method that allows a model trained on one task or dataset to be applied to another, related task or dataset.
  • Transfer learning: a technique used to use knowledge learned from one task to improve performance on a different but related task.
  • Transformer: a type of neural network that uses self-attention mechanisms to process input sequences and generate output sequences, and is particularly well suited to natural language processing tasks.
  • Transformer: a type of neural network used in natural language processing, characterized by the use of self-attention mechanisms.
  • Transformers-based models for speech recognition: a type of neural network that uses the transformer architecture for speech recognition tasks, such as automatic speech recognition (ASR) and text-to-speech (TTS).
  • Transformer-XL: a type of transformer-based neural network that is able to handle longer input sequences and maintain context information for improved performance on tasks such as language modeling and machine translation.
  • TRPO (Trust Region Policy Optimization) : A method that uses a trust region constraint to ensure that the new policy is not too different from the old one.
  • t-SNE (t-Distributed Stochastic Neighbor Embedding): A method that maps the data to a low-dimensional space while preserving the pairwise distances between the data points.
  • t-SNE (t-Distributed Stochastic Neighbor Embedding): a method that projects the data into a low-dimensional space while preserving the pairwise distances between the data points.
  • UCB (Upper Confidence Bound): A method for solving the exploration
  • ULMFiT (Universal Language Model Fine-tuning): a transfer learning method that fine-tunes a pre-trained language model on a new task using a small amount of labeled data.
  • UMAP (Uniform Manifold Approximation and Projection): A method that maps the data to a low-dimensional space while preserving the global structure of the data.
  • Undirected Graph: A graph where the edges do not have a direction, and the edges are unordered pairs of nodes (u,v) where u and v are connected nodes.
  • Unsupervised Learning: A type of machine learning where the model is trained on unlabeled data and is expected to find patterns or structure in the data.
  • VAE (Variational Autoencoder): a method that consists of an encoder that maps the input data to a lower-dimensional representation, a decoder that maps the lower-dimensional representation back to the original data space, and a regularization term that encourages the lower-dimensional representation to be close to a prior distribution such as a standard normal distribution.
  • VAE (Variational Autoencoder): a type of generative model that learns a compact latent representation of data by training an encoder to approximate the true posterior distribution of the data, and a decoder to generate new samples from the latent representation.
  • Variational Inference: a method for approximating intractable posterior distributions in Bayesian models using optimization techniques and a parametric family of distributions.
  • WaveNet: a type of generative model for speech synthesis that generates speech one sample at a time using a convolutional neural network.
  • Weight Decay: A method that adds a penalty term to the loss function proportional to the square of the weights, in order to prevent overfitting by encouraging the weights to be small.
  • Weight decay: a technique used to regularize the model by adding a term to the loss function that penalizes large weights.
  • Weighted Graph: A graph where each edge has a weight or a value associated with it.
  • XGBoost: A method that uses an ensemble of decision trees, like random forest, but with additional optimizations such as gradient boosting and regularization.
  • XGBoost: an optimized version of gradient boosting that uses a more efficient algorithm for finding the best split points in decision trees.
  • XLNet: A model that is trained to maximize the likelihood of the permuted data, allowing it to handle a wider range of input dependencies than BERT.
  • Zero-shot Learning: A method that allows a model to learn a new task or class without any examples of that task or class.

These are a few more commonly used algorithm terminologies and related definitions, Anomaly Detection algorithm terminologies, Bayesian Statistics terminologies, clustering algorithm terminologies, dimensionality reduction algorithm terminologies, evaluation and sampling algorithm terminologies, Evolutionary Computation terminologies, Graph Theory terminologies, hyperparameter tuning algorithm terminologies, Machine Learning terminologies, NLP algorithm terminologies, optimization algorithm terminologies, regularization and overfitting prevention algorithm terminologies, Reinforcement Learning terminologies and related definitions, I hope this information will be helpful for you.