Author: Gianluca Marchio
Date of submission: 28.04.2021



This document deals with Image segmentation. The topic is  initially dealt with starting from a historical mention of image  segmentation techniques before the advent of modern  machine learning technology and then being explained  generically. Subsequently, an important space is left to deepen  the image segmentation techniques based on machine  learning and their most common applications (with particular  mention to the biomedical field), associated advantages and  disadvantages and future innovations in this particular sector.


In research and image processing, people are often interested  only in certain parts of the images as they are more significant,  these parts are often referred to as targets or foregrounds Image segmentation therefore means the process of dividing  the image into characteristic areas, this allows for a more  effective subsequent processing by extracting objects of  interest from the same specific areas.(1)

Specifically, segmentation is the process by which image pixels  that have common characteristics are classified, therefore each  pixel in a region is similar to the others in the same region for  some property or characteristic (color, intensity or texture).  Adjacent regions therefore differ greatly in at least one of these  characteristics. The result of a segmented image is a collection  of significant segments that collectively cover the entire image.(2)

A segmented left human femur model is shown below as an  example. The significant regions are the outer surface (red), the  surface between the cortical bone and the spongy bone (green)  and the surface of the bone marrow (blue).


The development of image segmentation has been a long  accumulation of new technologies that have combined with  external conditions to meet qualitative changes. Before 2000, for example, we used several methods in digital  image processing: threshold segmentation, region  segmentation, edge segmentation, texture features, clustering  and so on. These methods do not make use of particular and  more recent techniques of Machine learning and some of them will be detailed below:(3)

Thresholding, for example, is the simplest way to segment  images without resorting to more complex technologies. From  a grayscale image, the method can be used to create binary  images. 

The method consists in cataloging the pixels as “object pixel” or  “background pixel” if their value exceeds or not a certain  predetermined threshold. Usually the output binary image has  a value equal to “1” where the object is present and equal to “0”  for the background, thus obtaining a black and white image. The  threshold therefore represents a key parameter and gives the  name to the method itself, the value can be chosen by hand or  automatically. A simple method would be to choose the mean  value or the median as the threshold value. This logic can work  well in an image without any kind of noise, otherwise you must  necessarily use more sophisticated and precise techniques,  such as creating a histogram and observing where there is the  lowest point in the middle of the histogram; then you choose  that value as the threshold.

An example of a segmented image by thresholding (top)  compared to the original (bottom) is given below:(4)

Edge segmentation, instead, is a segmentation method used for  the purpose of marking points in a digital image where light  intensity changes abruptly. This operation removes a lot of  information from the image as the interest is only the  information on the different geometric structures recognized  by the edges detected in the image.

The methods for recognizing contours are essentially divided  into two main categories: 

  1. Search-based methods they recognize the edges looking  for the maxima and minima of the first order derivative of  the image. Examples are the Roberts, Prewitt or Sobel  algorithms; 
  2. Zero-crossing methods they look for the points where  the derivative of the second order passes through zero. An  example is the Marr-Hildreth algorithm. 

Below is an example of input (top) and output obtained  (bottom):(5)

Clustering methods, on the other hand, represent a set of  multivariate data analysis techniques with the aim of selecting  and grouping homogeneous elements in a data set. Clustering  techniques are based on measures relating to similarity or,  conversely dissimilarity, between elements. 

In many approaches this measure is conceived in terms of  distance in a multidimensional space. The effectiveness of the  analyzes obtained by the clustering algorithms depends very  much on the choice of the metric, and therefore on the method  chosen for calculating the distance. 

An example of an algorithm is QT clustering, invented for gene  clustering. In this case the distance between a point and a group  of points is calculated using the complete concatenation, ie as  the maximum distance from the point of each member of the  group. 

An alternative method that however makes use of  unsupervised machine learning, instead, is K means.(6)

Returning to the history of the evolution of image  segmentation, from 2000 to 2010 there were 4 main methods  such as graph theory, clustering, classification and combination  of clustering and classification. 

Finally, since 2010, neural convolution networks (CNN) have  established themselves as a tool for image segmentation, they  are much more efficient as they are able to learn hierarchical  characteristics automatically.(3)


Image segmentation is the first step in image analysis and  processing, for example for pattern recognition, which directly  determines the quality of the final analysis result. It is a process that consists in dividing the images into different  and characteristic regions in such a way that the union of the  adjacent regions is not homogenous. 

A formal definition of image segmentation is showed below: If P() is a homogeneity predicate defined on groups of  connected pixels, then segmentation is a partition of the set F  into connected subsets or regions (S1; S2; :::; Sn) such that

The uniformity predicate P(Si) = true for all regions, Si, and P(Si  [ Sj) = false, when i =! j and Si and Sj are neighbors. 

Image segmentation techniques were initially designed for  gray-level monochrome images and then extended to color  images by using R, G and B or their transformations (linear /  non-linear). The segmentation of color images has attracted  more and more attention as color images generally have more  information and also thanks to the rapid technological rise of  PCs, with the current computing power it is possible to process  color images without any kind of problem. 

The approach to segmentation of monochrome images consists of detecting discontinuities and / or homogeneities between  the gray levels in the image, grouping common pixels into their  respective regions and then subdividing (segmenting) the  image.

The approaches based on homogeneity include thresholding,  clustering, region growing, and region splitting and merging. 

As for the color, however, it is perceived by humans as a  combination of R (red), G (green) and B (blue) which are also  defined as the three primary colors. From the representation R,  G, B, we can also derive other types of color spaces using linear  or non-linear transformations. Different color spaces, such as  RGB, HSI, CIE, L * u * v * are used in the segmentation of color  images, but none of them can predominate over the others for  all possible types of images.  

The components of the RGB space can be represented by the  brightness values of the scene obtained through three separate  filters (red, green and blue), defining the following equations: 

where SR, SG, SB are the color filters on the incoming light or  radiance E(λ), and λ is the wavelength. The problem of linear color spaces, however, is the high  correlation between the three components in terms of intensity  which therefore consists in a close dependence that does not  allow to effectively distinguish lights, shadows and shades in  color images.

Fortunately, most grayscale image segmentation techniques  can be extended to color images such as histogram. thresholding, clustering, region growing, edge detection, fuzzy  approaches and neural networks. The segmentation methods  can be applied directly to each color component and the results  are combined to obtain a segmentation of the overall image. 

Segmentation can be also viewed as image classification  problem based on color and spatial features. In this case  segmentation methods can be categorized as: 

  • Supervised algorithms for example Maximum  Likelihood, Decision Tree, K Nearest Neighbor, Neural  Networks, etc; 
  • Unsupervised algorithms for example adaptive  thresholding, fuzzy C-means, SCT/center split,  PCT)/median cut, split and merge, multiresolution  segmentation.


As previously introduced, image segmentation can also be seen  as a computational process of image classification. In this case,  the machine learning technology is exploited, which in recent  years has taken over traditional techniques. The classification  is then performed based on color and spatial features, to make  it possibile the basic architecture in image segmentation  consists of an encoder and a decoder: 

  • ENCODER extracts features from the image through  filters; 
  • DECODER generates the final output which is usually a  segmentation mask containing the outline of the object. 

U-Net, for example, is a convolutional neural network that was  initially designed for segmentation of biomedical images. The  name derives directly from the organization of its architecture  which when displayed resembles the letter U, hence the name  U-Net. In terms of organizations, it consists of two parts: a left  part (the contracting path) which takes care of capturing the  context, and a right part (the expansive path) that is to aid in  precise localization. The contracting path is made up of two  three-by-three convolutions. The convolutions are followed by  a rectified linear unit and a two-by-two max-pooling  computation for downsampling. 

FastFCN (Fast Fully-connected network), instead, uses a Joint  Pyramid Upsampling(JPU) module to replace dilated  convolutions to save a lot of memory and time. Basically it  consists of a fully connected network (core) and the JPU module  for upsampling. JPU upsamples the low-resolution feature  maps to high-resolution feature maps.

Another is Gated-SCNN that differs from the previous ones for  a particular CNN architecture organized in two-stream. In this  model, a separate branch is used to process image shape  information. The shape stream is used to process boundary  information. 

DeepLab, instead, features an architecture where convolutions  with upsampled filters are used for tasks that involve dense  prediction. Segmentation of objects at multiple scales is  garanteed via atrous spatial pyramid pooling. DCNNs are  usually used to improve the localization of object boundaries.  Atrous convolution is achieved by upsampling the filters  through the insertion of zeros or sparse sampling of input  feature maps. 

Finally, we present Mask R-CNN where objects are classified  and localized using a bounding box and semantic segmentation  that permit the classification of each pixel into a set of different  categories. In Mask R-CNN each region of interest is associated  with a segmentation mask, the output returned consists in a  class label and a bounding box. The basic architecture was  designed and conceived as an extension of Faster R-CNN, the  latter is made up of a deep convolutional network that proposes  the regions and a detector that utilizes the regions.(8)


Image segmentation can be applied to different fields such as  e.g. pattern recognition, image compression, and image  retrieval, these tasks are required in countless sectors,  especially the biomedical. In fact, up to now, lots of image  segmentation algorithms exist and be extensively applied in  science and daily life.(9)

Diagnostic imaging, for example, is a fundamental tool in the  biomedical sector. Established applications such as magnetic  resonance imaging (MRI), computed tomogrophy (CT), x-ray  radiography (RX) and other imaging modalities allow non 

invasive mapping of a subject’s anatomy and are a key  component in diagnosis and in treatment planning. The  segmentation algorithms, in the computational processing of  such biomedical images, provide an important support in the  quantification of tissue volumes, diagnosis and localization of  the pathology, study of the anatomical structure, treatment  planning, correction of the partial volume of functional imaging  data and integrated computer surgery. Methods for performing  segmentations vary widely depending on the specific  application, imaging modality, and other factors. For example,  the segmentation of brain tissue has different requirements  from the segmentation of the liver, so there is currently no  single segmentation method that yields acceptable results for  every medical image.. General imaging artifacts such as noise,  partial volume effects, and motion can also have significant  consequences on the performance of segmentation algorithms.

There are also more general methods that can be applied to a  wider variety of data, however the methods that they specialize  for particular applications can often get better performance by  taking taking into account previous knowledge. An example of  different output on the same object obtained through the  application of different methods is shown below. 

The selection of the approach and method to be used therefore  represents the greatest challenge and requires an important  previous knowledge both in terms of image segmentation and  application experience of the segmentation algorithms.(10)


The advantages and disadvantages related to the use of image  segmentation are both specific to the individual segmentation  techniques, and generic and related to the use of machine  learning. The recognition of the strengths and weaknesses of  the different segmentation techniques is therefore fundamental  for the choice of the approach in the various specific  applications. 

One of the most commonly used methods (also described  above) is the threshold method. In this method, the image pixels  are divided with the help of intensity level of an image. This  method is mainly used to distinguish objects in the foreground  from the background images. Among the main advantages  there are certainly the simplicity of implementation and the  speed of response, as well as the good effectiveness on some  particular types of images (e.g. documents, controlled lighting). However, the algorithm is not flawless, there is no guarantees  of object coherency — may have holes, extraneous pixels, etc. 

Another widely used method is the Region Based Segmentation  Method, based on segmented an image on the basis of similar  characteristics of the pixels. Unlike the previous method  connected regions are guaranteed and IQM reduces length  neighbor problems during merging (main advantage). The  main drawback, on the other hand, is that the position and  orientation of the image lead to blocky final segmentation and  regular division leads to over segmentation (more regions) by  splitting.(11)

The advantages and disadvantages of segmentation techniques,  finally, are then combined with those of machine learning 

including the reduction of human error (which is the main  advantage), but also the time required for the acquisition of  data for the training process that it is often laborious and  rapresents the main drawback from machine learning  tecnhology. 


The future of image segmentation techniques is closely linked  to the continuing evolution of machine learning. The crucial  points that could mark the near future are the evolution of  improved unsupervised algorithms and the increasingly  marked introduction of robots based on artificial intelligence. Unsupervised algorithms, in fact, are left to work on their own,  they discover and identify hidden patterns or groupings within  a data set that would not have been identified using supervised algorithms. Their further development would give an  important impetus to the implementation of artificial  intelligence systems that will undoubtedly improve machine  learning-based image segmentation techniques and their  applications (especially in assisted surgery).


This paper has dealt with the problem of image segmentation  starting from a generic introduction and deepening their  techniques based on machine learning. It is evident that these  techniques are now established and used in many sectors,  especially the biomedical one in which they are the basis of the  most common diagnostic techniques, but it is reasonable to  expect that the evolution of machine learning will accompany  their important boost in the coming years. This will certainly  increase their usefulness even in the most everyday  applications, thus overcoming the current application limits.


  1. Nabil MADALI (2020) ‘Introduction To Image  Segmentation’ [Online], available at: 
  1. WIKIPEDIA ‘Image segmentation’ [Online], available at:


  2.  Eai Fund Offical (2018) ‘History of image segmentation’ [Online], available at:


  3. WIKIPEDIA ‘Thresholding (image processing)’ [Online],  available at: essing)


  4. WIKIPEDIA ‘Edge detection’ [Online], available at:


  5. WIKIPEDIA ‘Cluster Analysis’ [Online], available at: 7. H. D. Cheng, X. H. Jiang, Y.


  6. Sun and Jing Li Wang ‘Color  Image Segmentation: Advances & Prospects’ [Online],  available at:


  7. Derrick Mwiti (2020) ‘Image Segmentation in 2021:  Architectures, Losses, Datasets, and Frameworks’ [Online],  available at:


  8. ‘Segmentation turial’ [Online], available at:


  1. Dzung L. Phamy, Chenyang Xu, Jerry L. Prince (1998)  ‘A SURVEY OF CURRENT METHODS IN MEDICAL IMAGE  SEGMENTATION’ [Online], available at: 1.pdf


  2. Anjna, Rajandeep Kaur (2017) ‘Review of Image  Segmentation Technique’ [Online], available at: