Author: Gianluca Marchio
Date of submission: 27.04.2021



This paper deals with the field of machine learning, with a particular focus on applications in object detection. Starting from a generic introduction on the problem of machine learning, and from a historical hint, the methodologies applied for machine learning are explained. The main part of the document then focuses on machine learning algorithms regarding object detection. By analyzing the most common applications, associated advantages and disadvantages are then discussed. A final space is left for future applications and innovations in the sector.


Machine learning is an ever-evolving branch of computation algorithms designed to emulate human intelligence. These algorithms are able to emulate and make decisions by learning from the surrounding environment, at present they are considered the workhorse of the so-called big data.(1)


The last 25-30 years have characterized an explosion in the field of research in machine learning. This was due to several causes, and, in particular, to the integration between various research branches that were initially separate from each other. Research communities in symbolic machine learning, computational learning theory, neural networks, statistics, and pattern recognition have begun to work together resulting in an accelerated evolution of machine learning. The results obtained, rather than converging in a single direction, diverge in numerous applications that mostly make use of neural networks. Among the most common applications there are therefore pattern recognition, prediction, optimization and
signal processing up, but also image classification, object detection and tracking, and action recognition.(2) The success of ML in recent years is largely due to the evolution of some existing technologies such as, for example, in the field of image recognition and object detection. The advances made over the years have demonstrated the great impact that ML can have in specialized tasks. In parallel with the rise of these technologies in industrial applications, scientists have become  increasingly interested in the potential of ML, especially physicists. The meeting point between the two disciplines, in fact, is the process of collecting and analyzing data to design models capable of predicting the behavior of complex systems.
Even the most inaccessible applications to automated software have been successfully enabled, largely thanks to deep learning technology.(3)


Machine learning represents a set of computational methods used to make and improve predictions or behaviors based on data sets. A very simple example of how it works can be applied to real estate sales: to predict the value of a home, the computer
would learn patterns from past home sales. The learning methods, in general, are divided into 3 main categories:

  1. Supervised learning: This method covers all forecasting problems where we have a set of input data for which we already know the output of interest and want to learn how to predict the outcome of new data. The protagonists are therefore training algorithms that allow the machine to learn through these input-output examples.
    The machine learning algorithm learns a model by estimating parameters, such as the weights of a computation graph or learning structures (such as computational trees). Furthermore, the algorithm is guided by a score or loss function that is minimized to maximize machine performance. In the example of the sale of a property, the machine minimizes the difference between the estimated price of the house and the price. predicted.
    The goal of supervised learning is to learn a predictive model that maps data characteristics to an output. A categorical output is typical of the classification activity, while a numerical output defines the regression activities;(4)
  2. Reinforcement learning: → It represents a particular case of supervised learning. In this case the only information available is whether the output predicted is correct or not, the desired output is “hidden”;
  3. Unsupervised learning: In this case the machine on self-
    adapts to its internal environment to determine with which functionality to group the input data. It continues to be there research in this field especially with regard to robots, these, in fact, could learn for themselves when they find new environments for which there is no specific training set.;(5)

Machines, thanks to the evolution of machine learning, now surpass humans in many tasks. Furthermore, although sometimes they do not reach the same human capabilities, great advantages remain in terms of speed, reproducibility and scaling. A machine learning model can complete a task much faster than humans, deliver consistent and reliable results, and can be replicated to other machines easily and inexpensively.
Training a human to perform a certain task can take several years and is economically disadvantageous compared to machine learning. Computational schemes, on the other hand, turn out to be very complex and represent a sort of black box in which it is impossible to obtain information on how the forecast, or the output, was obtained. Millions of numbers are needed to describe a deep neural network and it is not possible to fully understand the model. Other models, such as the random forest, are made up of hundreds of decision trees that “vote” for predictions. To fully understand how the decision was made it would be necessary to examine the “votes” and the internal dynamics of each of these structures and this simply does not work, this is due to the inherent complexity of each sub-structure. The best performing models, in fact, are often blends of several models (also called ensembles).;(4)


The problem of object detection and tracking is a long-standing problem in the field of computer vision and machine learning. The use of CNNs, convolutional neural networks, turns out to be one of the most suitable and most used methods, but the actual development did not undergo a major development before 2012 due to the high computing costs and limited power in terms of computational resources. available. The evolution of CNN has allowed the development of different
algorithms for this purpose, some examples will be discussed below. These methods, in general, exploit fast and generic measurements to test whether a sampled window is a potential object or not, and further pass the output object proposals to more sophisticated and deep detectors to determine whether they are background or belong to a specific object class. One of the most known and important object proposal based CNN detector is Region-based CNN (R-CNN). This algorithm uses Selective Search (SS) to extract around 2000 bottom-up region that probably contain objects. Then, these region proposals are warped to a pre-setted size (227 × 227), and a pre-trained CNN is used to extract features from them. Finally, a binary SVM classifier is necessary to final detection. R-CNN, although it produces a noticeable performance improvement, requires an excessively high computation cost as the CNN-based feature extractor is applied separately to each selected region.

An example of an algorithm that addresses this problem effectively is OverFeat. OverFeat computes CNN features from an image pyramid for localization and detection. With this mode the calculation is shared between overlapping windows and is less expensive computationally, according to some recent works that proposed to share the computation in feature extraction.

Spatial pyramid pooling network (SPP net), instead, is a pyramid-based version of R-CNN, which introduces an SPP layer to reduce the constraint that input images must have a pre-setted size. Unlike R-CNN, SPP net extracts the feature maps from the entire image only once, and then applies spatial pyramid pooling on each candidate window to get a fixed-length representation. The main disadvantage is that the defined training procedure is a multi-stage pipeline, it will therefore be impossible to train the CNN feature extractor and SVM classifier jointly.

Fast RCNN efficiently enhances the SPP network using an end-to-end training method. All network levels it can be updated during setup. This mode greatly affects the accuracy of detection, as well as clearly simplifying the learning process. Later, Faster R-CNN introduces a region proposal network (RPN) for object proposals generation and achieves further speed-up.

Finally, other particular algorithm is YOLO and SSD. YOLO treats object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. In this case the whole detection pipeline is a single network which predicts bounding boxes and class probabilities from the full image in one evaluation, and can be optimized end-to-end directly on detection performance. SSD, instead, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. That’s the reason why, with this multiple scales setting and their matching strategy, SSD is significantly more accurate than YOLO.;(6)


To emphasize how machine learning is so current and useful, the most successful applications will be listed below, it is also good to note as late as 1985 all this was impossible to achieve.

  • Speech recognition: All commercial speech recognition systems and voice assistants are based on the technology offered by machine learning which in this case is used to recognize speech. The precision and accuracy achieved is another level than a hand-programmed system. Many commercial speech recognition systems involve two distinct learning phases. The system is trained before it is shipped (training the general system in a speaker-independent fashion), and a second phase of training is done after the user purchases the software (to achieve greater accuracy by training in a speaker-dependent fashion);
  •  Computer vision: By now, many current vision systems (from face recognition systems to systems that automatically classify microscope images of cells) are developed using the technology offered by machine learning. The post office in the United States, for example, uses a machine learning-based application that can perfectly recognize handwriting to sort mail automatically. Over 85% of handwritten mail is sorted this way;
  • Bio-surveillance: Per rilevare e monitorare i focolai di malattia (for example Covid-19) una varietàdi sforzi del governo ora fa uso dell’apprendimento automatico. Another example is the RODS project that involves real-time collection of admissions reports to emergency rooms across western Pennsylvania. The use of machine learning software is necessary to learn the
  • profile of typical admissions so that it can detect anomalous patterns of symptoms and their geographical distribution;
  • Robot control: Machine learning technologies have offered a significant boost to the development of robots. For example, several researchers have demonstrated the use of machine learning to acquire control strategies for stable helicopter flight and helicopter aerobatics.;(7)


The advantages offered by the use of machine learning are so important that they potentially revolutionize every professional sector, among the main advantages we find:

  1. Reduction in Human Error: In weather forecasting, for example, the use of artificial intelligence has greatly reduced human error. If the system is properly trained and guarantees the required degree of precision and accuracy, its use will significantly reduce human error in the same task;
  2. Faster Decisions: The use of machine learning, especially when combined with other technologies, allows you to make quick decisions without analyzing secondary factors that could influence the human decision, such as the emotional factor;;(8)
  3. Taking risks on behalf of humans: With machine learning it is certainly possible to drastically reduce the risks to which human workers are exposed, the use of automated machines is now known for this advantage.;(9)

On the other hand, however, among the most well-known disadvantages are:

  1. Data acquisition: Often the training process is quite laborious and takes a long time to process and learn from the data, it is also absolutely necessary that these are of good quality;
  2. Interpretation: When the algorithms help in all these processes and give a resulting output, this given output must be checked for any errors;(10)
  3. Lack of judgement calls: In some exceptional circumstances (for example an emergency) humans can make decisions that are not usual but necessary, something that machine learning may never be able to do.(9)


The evolution of machine learning has not stopped over the years but continues to have continuous impulses from different fields of research. Among the most prominent topics are:

  1. Improved Unsupervised Algorithms: Unsupervised algorithms work on artificial intelligence. When algorithms are left to work on their own, they discover and identify hidden patterns or groupings within a data set that would not have been identified using supervised algorithms. Further improvements in unsupervised machine learning algorithms may be seen in the near future;
  2. Increased Adoption of Quantum Computing: Quantum Machine Learning algorithms have great potential to completely transform the future of ML and their applications. Quantum computers, when they take advantage of machine learning, lead to faster and more efficient data processing. This performance improvement helps companies achieve results that were not possible using classic ML techniques. Two big companies like Microsoft and Google have already announced their plans to take advantage of the technology in the near future;
  3. Enhanced Personalization: Machine learning personalization algorithms are used to offer customers product recommendations. Machine learning algorithms read customer behavior and draw conclusions about people’s interests. Businesses can use this information to send product recommendations such as emails and personalized messages to their potential customers;
  4. Rise of Robots: Robots use machine learning algorithms to perform the tasks for which they are employed. As robots perform tasks faster, companies are adopting robotic techniques to increase their productivity.


This paper has dealt with the problem of machine learning starting with a general introduction, delving into applications in object detection, and has left space for future innovations. As mentioned, the potential of this already very current technology is evident, but the real unknown remains the future. It is reasonable to expect that it will find a definitive consecration and a new rebirth by overcoming the current limits of use.


1. Issam El NaqaEmail, Martin J. Murphy (2015) ‘Machine Learning in Radiation Oncology’ [Online], available at:

2. Thomas G. Dietterich (1997) ‘Machine-Learning Research Four Current Directions’ [Online], available at:

3. Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, Lenka Zdeborová (2019) ‘Machine learning and the physical sciences’ [Online], available at:

4. Christoph Molnar (2021) ‘Interpretable Machine Learning’ [Online], available at:

5. Dave Anderson, George McNeill (1992) ‘ARTIFICIAL NEURAL NETWORKS TECHNOLOGY’ [Online], available at:

6. Jiuxiang Gua, Zhenhua Wangb, Jason Kuenb, Lianyang Mab, Amir Shahroudyb, Bing Shuaib, Ting Liub, Xingxing Wangb, Li Wangb, Gang Wangb, Jianfei Caic, Tsuhan Chen (2017) ‘Recent Advances in Convolutional Neural Networks’ [Online], available at:

7. Tom M. Mitchell (2006) ‘The Discipline of Machine Learning’ [Online], available at:

8. Sunil Kumar (2019) ‘Advantages and Disadvantages of Artificial Intelligence’ [Online], available at:

9. Cetpa Infotech (2020) ‘What Are The Advantages And Disadvantages Of Machine Learning?’ [Online], available at:

10. Amit Agrawal ‘Highlights the advantages and disadvantages of machine learning’ [Online], available at:

11. Scarlett Rose (2020) ‘What is the Future of Machine Learning?’ [Online], available at: