Marshal Sinaga | Publications

2020

ICACSIS 2020
Least Square Adversarial Autoencoder

Sinaga, Marshal, and Stefanus, Lim Yohanes

2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS) 2020

Abs Cite Paper Code

This research introduces least square adversarial autoencoder (LSAA)-an autoencoder that is able to reconstruct data and also generate data that has characteristics similar to data distribution from the prior distribution. LSAA uses least square generative adversarial network loss function on its discriminator. LSAA minimizes Pearson χ 2 divergence between the latent variable distribution and the prior distribution. In this research, a Python program is developed to model LSAA by utilizing MNIST data set and FashionMNIST data set. The program is implemented using PyTorch. All of the programming activities are carried out in the cloud environment provided by the Tokopedia-Universitas Indonesia AI Center, using DGX-1 (GPU Tesla V100) as its computing resource. The experimental results show that the mean squared error of LSAA for MNIST data set and FashionMNIST data set are 0.0080 and 0.0099, respectively. Furthermore, the Fréchet Inception Distance score of LSAA for MNIST data set and FashionMNIST data set are 11.1280 and 27.5737, respectively. These results indicate that the least square adversarial autoencoder is able to reconstruct the image properly and also able to generate images similar to the training samples.
@article{sinaga2020least, bibtex_show = {true}, title = {Least Square Adversarial Autoencoder}, author = {Sinaga, Marshal and Stefanus, Lim Yohanes}, year = {2020}, journal = {2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS)}, pages = {33--40}, publisher = {IEEE,}, paper = {https://www.researchgate.net/publication/344516945_Least_Square_Adversarial_Autoencoder}, abbr = {ICACSIS 2020}, code = {https://github.com/MarshalArijona/least-square-adversarial-autoencoder} }
Bachelor’s thesis: Least Square Adversarial Autoencoder and Its Application for Image Reconstruction and Image Generation (in Bahasa Indonesia)

Sinaga, Marshal

2020

Abs PDF

This Final Project (Tugas Akhir) investigates the least square adversarial autoencoder that uses least square generative adversarial network as its discriminator. The discriminator minimizes the Pearson χ2 divergence between the latent variable distribution and the prior distribution. The presence of discriminator allows the autoencoder to generate data that has characteristics that resemble the original data. Python programs were developed to model the least square adversarial autoencoder. This programs try to model two types of autoencoder namely unsupervised least square adversarial autoencoder and supervised least square adversarial autoencoder by utilizing MNIST dataset and FashionMNIST dataset. The unsupervised least square adversarial autoencoder uses latent variables of dimension 20 while the supervised least square adversarial autoencoder uses latent variables with dimensions of 2, 3, 4, and 5, respectively. This programs were implemented using PyTorch and executed using Jupyter Notebook. All of the programming activities are carried out in the cloud environment provided by Floydhub and Tokopedia-UI AI Center, respectively using NVIDIA Tesla K80 GPU and NVIDIA Tesla V100 GPU as their computing resource. Training time in unsupervised least square adversarial autoencoder lasts for two hours while in supervised least square adversarial autoencoder lasts for six hours. The Results of experiments show that the mean squared error of unsupervised least square adversarial autoencoder for MNIST dataset and FashionMNIST dataset are 0.0063 and 0.0094, respectively. Meanwhile, the mean squared error of supervised least square adversarial autoencoder for MNIST dataset is 0.0033. Furthermore, the Fr ́echet Inception Distance scores of unsupervised least square adversarial autoencoder for MNIST dataset and FashionMNIST dataset are 15.7182 and 38.6967, respectively. Meanwhile, the value of Fr ́echet Inception Distance score of supervised least square adversarial autoencoder in MNIST dataset is 62.512. These results indicate that the least square adversarial autoencoder is able to reconstruct the image properly, but is less able to generate images with the same quality as the learning sample.

2021

Master’s Thesis: Transformation-Equivariant Representation Learning with Barber-Agakov and Noise Contrastive Mutual Information Estimation (in Bahasa Indonesia)

Sinaga, Marshal

2021

Abs PDF

Convolution neural network (CNN) has shown promising results on various image classification tasks. One of the reasons due to the ability of CNN to extract representation that is equivariant to transformations. However, the notion only holds for the translation transformation. This research introduces variational transformation equivariant (VTE), a more general unsupervised transformation-equivariant representation model. During the implementation, VTE utilizes the Predictive-transformation, a self-supervised learning model that acts as inductive bias. The optimization of VTE involves two lower bound mutual information methods: Barber-Agakov and information noise contrastive (InfoNCE). The VTE models are evaluated based on the average error rate on image classification tasks on CIFAR-10 and STL-10 datasets. We utilize multi-layer perceptron, K-nearest neighbor, and multinomial logistic regression as the classifiers. Results show VTE with Barber-Agakov and VTE with InfoNCE outperform the baseline model for each classifier on both datasets. Specifically, VTEBA consistently achieves the lowest average error rate for both datasets.
IWBIS 2021
Variational Contrastive Log Ratio Upper Bound of Mutual Information for Training Generative Models

Sinaga, Marshal, Alhamidi, Machmud Roby, Rachmadi, Muhammad Febrian, and Jatmiko, Wisnu

2021 6th International Workshop on Big Data and Information Security (IWBIS) 2021

Abs Cite Paper Code

Theoretically, a Generative adversarial network minimizes the Jensen-Shannon divergence between real data distribution and generated data distribution. This divergence is another form of mutual information between a mixture distribution and a binary distribution. It implies that we can build a similar generative model by optimizing the mutual information. This research proposes variational contrastive log-ratio upper bound vCLUB mutual information estimation on mixture distribution and the optimization algorithm to train two generative models. We call the models CLUB-sampling generative network (vCLUBsampling GN) and vCLUB-non sampling generative network (vCLUB-non sampling GN). The results show that vCLUBsampling outperforms GAN and vCLUB-non sampling GN on the MNIST dataset and has competitive results with GAN on the CIFAR-10 dataset. However, GAN outperforms vCLUB-non sampling GN on both datasets.
@article{sinaga2021club, bibtex_show = {true}, title = {Variational Contrastive Log Ratio Upper Bound of Mutual Information for Training Generative Models}, author = {Sinaga, Marshal and Alhamidi, Machmud Roby and Rachmadi, Muhammad Febrian and Jatmiko, Wisnu}, journal = {2021 6th International Workshop on Big Data and Information Security (IWBIS)}, pages = {9--16}, year = {2021}, publisher = {IEEE,}, code = {https://github.com/MarshalArijona/CLUB-Generative-Network}, paper = {https://www.researchgate.net/publication/349881819_Variational_Contrastive_Log_Ratio_Upper_Bound_of_Mutual_Information_for_Training_Generative_Model}, abbr = {IWBIS 2021} }
ICONIP 2021
Tile2Vec with Predicting Noise for Land Cover Classification

Sinaga, Marshal, Ali, Fadel Muhammad, and Arymurthy, Aniati Murni

International Conference on Neural Information Processing 2021

Abs Cite Paper Code

Tile2vec has proven to be a good representation learning model in the remote sensing field. The success of the model depends on l2-norm regularization. However, l2-norm regularization has the main drawback that affects the regularization. We propose to replace the l2-norm with regularization with predicting noise framework. We then develop an algorithm to integrate the framework. We evaluate the model by using it as a feature extractor on the land cover classification task. The result shows that our proposed model outperforms all the baseline models.
@article{sinaga2021tile2vec, bibtex_show = {true}, title = {Tile2Vec with Predicting Noise for Land Cover Classification}, author = {Sinaga, Marshal and Ali, Fadel Muhammad and Arymurthy, Aniati Murni}, journal = {International Conference on Neural Information Processing}, pages = {87--99}, year = {2021}, publisher = {Springer,}, paper = {https://www.researchgate.net/publication/356458961_Tile2Vec_with_Predicting_Noise_for_Land_Cover_Classification}, abbr = {ICONIP 2021}, code = {https://github.com/MarshalArijona/tile2vec-with-predicting-noise/tree/master} }
On study of Variational Inference

Sinaga, Marshal

2021

Abs PDF

The main problem of Bayesian inference is computing the posterior which is often intractable. In this paper, we review variational inference (VI) methods that aim to find a variational distribution to approximate the true posterior. The review starts with the original form of variational inference. Then we also discuss the extension of VI: mean-field, stochastic and black-box variational, and amortized inference. Finally, we review the recent improvement of variational inference.
On Study of Mutual Information and Its Estimation Methods

Sinaga, Marshal

2021

Abs PDF

The presence of mutual information in the research of deep learning has grown significantly. It has been proven that mutual information can be a good objective function to build a robust deep learning model. Most of the researches utilize estimation methods to approximate the true mutual information. This technical report delivers an extensive study about definitions as well as properties of mutual information. This article then delivers some reviews and current drawbacks of mutual information estimation methods afterward.

2022

ICPRAM 2022
Transformation Equivariant Representation Learning with Barber Agakov and Info-NCE Mutual Information Estimation

Sinaga, Marshal, Basarrudin, T., and Krisnadhi, Adila Alfa

International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2022

Abs Cite Paper Code

The success of deep learning on computer vision tasks is due to the convolution layer that equivaries to the translation transformation. Several works attempt to extend the notion of equivariance into more general transformations. Autoencoding variational transformation (AVT) achieves state of art by approaching the problem from the information theory perspective. The model involves the computation of mutual information, which leads to a more general transformation equivariant representation model. In this research, we investigate the alternatives of AVT called variational transformation equivariant (VTE). We utilize the Barber-Agakov and Info-NCE mutual information estimation to optimize VTE. Furthermore, we also propose a sequential mechanism to train our VTE. Results of experiments demonstrate that VTE outperforms AVT on image classification tasks.
@article{sinaga2022vte, bibtex_show = {true}, title = {Transformation Equivariant Representation Learning with Barber Agakov and Info-NCE Mutual Information Estimation}, author = {Sinaga, Marshal and Basarrudin, T. and Krisnadhi, Adila Alfa}, journal = {International Conference on Pattern Recognition Applications and Methods (ICPRAM)}, year = {2022}, publisher = {Scitepress,}, paper = {https://www.researchgate.net/publication/356459026_Transformation_Equivariant_Representation_Learning_with_Barber_Agakov_and_Info-NCE_Mutual_Information_Estimation}, abbr = {ICPRAM 2022}, code = {https://github.com/MarshalArijona/VTE/tree/master} }

2023

RealML 2023

Preferential Heteroscedastic Bayesian Optimization with Informative Noise Priors

Sinaga, Marshal, Martinelli, Julien, and Kaski, Samuel

NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World 2023

Abs PDF

Preferential Bayesian optimization (PBO) is a sample-efficient framework for optimizing a black-box function by utilizing human preferences between two candidate solutions as a proxy. Conventional PBO relies on homoscedastic noise to model human preference struc- ture. However, such noise fails to accurately capture the varying levels of human aleatoric uncertainty among different pairs of candidates. For instance, a chemist with solid expertise in glucose-related molecules may easily compare two compounds and struggle for alcohol- related molecules. Furthermore, PBO ignores this uncertainty when searching for a new candidate, consequently underestimating the risk associated with human uncertainty. To address this, we propose heteroscedastic noise models to learn human preference structure. Moreover, we integrate the preference structure with the acquisition functions that account for aleatoric uncertainty. The noise models assign noise based on the distance of a specific input to a predefined set of reliable inputs known as anchors. We empirically evaluate the proposed approach on a range of synthetic black-box functions, demonstrating a consistent improvement over homoscedastic PBO.

2024

preprint

Preferential Heteroscedastic Bayesian Optimization with Informative Noise Distribution

Sinaga, Marshal, Martinelli, Julien, Garg, Vikas, and Kaski, Samuel

arXiv preprint 2024

Abs Paper

Preferential Bayesian optimization (PBO) is a sample-efficient framework for learning human preferences between candidate designs. PBO classically relies on homoscedastic noise models to represent human aleatoric uncertainty. Yet, such noise fails to accurately capture the varying levels of human aleatoric uncertainty, particularly when the user possesses partial knowledge among different pairs of candidates. For instance, a chemist with solid expertise in glucose-related molecules may easily compare two compounds from that family while struggling to compare alcohol-related molecules. Currently, PBO overlooks this uncertainty during the search for a new candidate through the maximization of the acquisition function, consequently underestimating the risk associated with human uncertainty. To address this issue, we propose a heteroscedastic noise model to capture human aleatoric uncertainty. This model adaptively assigns noise levels based on the distance of a specific input to a predefined set of reliable inputs known as anchors provided by the human. Anchors encapsulate partial knowledge and offer insight into the comparative difficulty of evaluating different candidate pairs. Such a model can be seamlessly integrated into the acquisition function, thus leading to candidate design pairs that elegantly trade informativeness and ease of comparison for the human expert. We perform an extensive empirical evaluation of the proposed approach, demonstrating a consistent improvement over homoscedastic PBO.

2025

NeurIPS 2025

Robust and Computation-Aware Gaussian Processes

Sinaga, Marshal, Martinelli, Julien, and Kaski, Samuel

Advances in neural information processing systems 39 (NeurIPS 2025) 2025

Abs Paper Code Video

Gaussian processes (GPs) are widely used for regression and optimization tasks such as Bayesian optimization (BO) due to their expressiveness and principled uncertainty estimates. However, in settings with large datasets corrupted by outliers, standard GPs and their sparse approximations struggle with computational tractability and robustness. We introduce Robust Computation-aware Gaussian Process (RCaGP), a novel GP model that jointly addresses these challenges by combining a principled treatment of approximation-induced uncertainty with robust generalized Bayesian updating. The key insight is that robustness and approximation-awareness are not orthogonal but intertwined: approximations can exacerbate the impact of outliers, and mitigating one without the other is insufficient. Unlike previous work that focuses narrowly on either robustness or approximation quality, RCaGP combines both in a principled and scalable framework, thus effectively managing both outliers and computational uncertainties introduced by approximations such as low-rank matrix multiplications. Our model ensures more conservative and reliable uncertainty estimates, a property we rigorously demonstrate. Additionally, we establish a robustness property and show that the mean function is key to preserving it, motivating a tailored model selection scheme for robust mean functions. Empirical results confirm that solving these challenges jointly leads to superior performance across both clean and outlier-contaminated settings, both on regression and high-throughput Bayesian optimization benchmarks.