Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Abstract
Abstract Fertivision 2017
Case Report
Clinical Practice Guideline
Commentary
Current Issue
Editor's view point
Editorial
Editorial View Point
Fertivision 2015 - Abstracts
Guest Editorial
IFS pages
Invited Editorial
Letter to the Editor
Media & News
Narrative Review Article
Original Article
Original Article | Gynaecology
Original Article| Infertility
Original Research
PCOS Guideline
Point of View
Review Article
Review Article | Infertility
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Abstract
Abstract Fertivision 2017
Case Report
Clinical Practice Guideline
Commentary
Current Issue
Editor's view point
Editorial
Editorial View Point
Fertivision 2015 - Abstracts
Guest Editorial
IFS pages
Invited Editorial
Letter to the Editor
Media & News
Narrative Review Article
Original Article
Original Article | Gynaecology
Original Article| Infertility
Original Research
PCOS Guideline
Point of View
Review Article
Review Article | Infertility
View/Download PDF

Translate this page into:

Review Article
2025
:12;
30
doi:
10.25259/FSR_28_2025

Dissecting Datasets Used by Artificial Intelligence or Deep Learning Models for Morphological Assessment of Blastocysts: A Systematic Review

Institute of Biological Science, SAGE University, Indore,
Department of Reproductive Medicine, Indore Infertility Clinic, Indore,
Department of Reproductive Medicine, Cellsure Biotech Research Centre, Dombivli, Maharashtra, India.
Author image

*Corresponding author: Doel Bose Pande, Department of Reproductive Medicine, Indore Infertility Clinic, Indore, India. doelpande@gmail.com

Licence
This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-Share Alike 4.0 License, which allows others to remix, transform, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.

How to cite this article: Pande DB, Deshmukh H, Kandari S, Verma SK. Dissecting Datasets Used by Artificial Intelligence or Deep Learning Models for Morphological Assessment of Blastocysts: A Systematic Review. Fertil Sci Res. 2025;12:30. doi: 10.25259/FSR_28_2025

Abstract

ABSTRACT

This systematic review scrutinised the datasets used to train Artificial Intelligence (AI) tools developed for automated blastocyst assessment. It is widely acknowledged that the quality of these datasets significantly influences the performance of the AI models. Analysis of datasets from 26 studies highlighted considerable variations in various dataset parameters such as dataset size, data diversity, image quality, image capture mechanism and timing, class distribution, dataset endpoints and metadata usage. Some models incorporate morphokinetic or morphometric annotations and clinical metadata, whereas others rely solely on single-point static images. Many studies lack crucial information, such as image capture timing, embryo transfer strategy, and vital information related to the removal of confounding factors, such as uterine factors, hindering cross-study comparisons. Standardisation of datasets is vital for accurate assessment and comparison of commercially available AI models used for blastocyst assessment. The absence of standardised parameters and the lack of removal of confounding variables emphasise the need for greater transparency and standardisation in dataset creation and reporting. Future research should prioritise constructing a robust gold-standard large dataset that includes diverse imaging data and excludes confounding factors. In the absence of such a dataset, comparison of the AI model becomes very subjective.

Keywords

Artificial intelligence
Blastocyst assessment
Computer vision in medical imaging
Deep learning
Embryo assessment
Machine learning

INTRODUCTION

In recent years, the utilisation of Artificial Intelligence (AI) and Computer Vision in medical image processing has witnessed notable advancements. Studies such as Puttagunta M.[1] have highlighted the substantial promise of AI in automating complex image analysis, making it a compelling choice for applications that rely heavily on visual data, such as radiology.

In vitro fertilisation (IVF) and the introduction of extended culture media have facilitated the extensive adoption of blastocyst culture.[2] Blastocyst culture enables the selection of the most robust embryo from the cohort and makes single-blastocyst transfers possible.[3] The main benefit of elective single-embryo transfer is to avoid transferring more than one embryo that could result in multiple gestations, which have increased antenatal complications and risks.[4] Central to this paradigm shift is the essential process of blastocyst assessment and selection using visual morphological data of an embryo, which has a substantial influence on the effectiveness and outcomes of IVF procedures.

Currently, this assessment relies predominantly on manual visual evaluation conducted by experienced embryologists. An embryologist usually performs this assessment at a fixed time post-insemination, also known as hours post-insemination or hpi.[5] An experienced embryologist observes various morphological features of blastocysts within the cohort to compare and determine the grade of the blastocysts. The most common scoring system for grading blastocysts is the Gardner scoring system.[6] Most clinics have an image capture system attached to an optical microscope, which allows embryologists to capture images of blastocysts for subsequent assessment. Cost considerations pose a significant barrier to the widespread adoption of time-lapse incubators for continuous embryo monitoring in clinical settings. Only a limited number of facilities worldwide have incorporated this technology. Using a time-lapse system, an embryologist can visualise individual embryos in culture at different time points without removing them from the incubator.[7]

At the core of the process of embryo selection is the concept of understanding the morphology of a blastocyst and comparing the morphology with others in the cohort to grade and rank the blastocyst.[8] This enables embryologists to choose the best embryo for transfer or to freeze it for later use. Embryologists who perform this task regularly learn and gather experience in embryo grading based on the number of embryos they grade and the known outcome or endpoint of the embryo transfer. This cyclic process enables embryologists to perform better as they gain more experience. Most embryologists worldwide use the Gardner scoring system to score three blastocyst parameters: expansion grade, inner cell mass (ICM) quality, and trophectoderm (TE) quality.[3]

In recent years, significant progress has been made in leveraging machine learning (ML) algorithms and computer vision to automate tasks that were traditionally performed manually. This advancement extends to the domain of medical imaging, where the acquisition and analysis of image data may play a crucial role in clinical decision-making. AI models trained using either supervised or unsupervised learning methods have been used to enhance these processes.[9]

Given that these models operate at the raw pixel level, over time, they possess the capability to learn a spectrum of features, ranging from macroscopic attributes such as embryo/cell shape and size to finer details such as texture and pattern, all derived from the training data.[10]

Various AI models have been tested in the last few years for blastocyst assessment with varying degrees of accuracy.[11]AI models have the potential to learn from specific image features by analysing thousands of images used to train them with a very high processing speed and assist an embryologist by either ranking the embryos or associating a predictive score with the embryo to act as a decision support tool.[12] A systematic review by Salih M[13] indicated that AI models demonstrated a median accuracy of 77.8% in predicting clinical pregnancy using patient clinical treatment information, in contrast to 64% when performed by embryologists alone. While integrating both images/time-lapse and clinical information, the AI models achieved a higher median accuracy of 81.5% than clinical embryologists, with a median accuracy of 51%.

The effectiveness of AI models used for blastocyst assessment depends substantially on the datasets employed for training the AI model.[14] The capacity to assess the comparative performance of various commercially available AI models depends on their adherence to comparable standards in the input data used to train AI models. This includes considerations such as the quality of images fed to AI systems, disparity in image capture tools, similarity in the timing of image capture, diversity in image resolution, uniformity in using image annotations and clinical metadata, and meticulous collection of similar endpoints in terms of either foetal heartbeat or live birth. The datasets used to train AI models, especially those trained to predict implantation or live birth, should ideally exclude confounders such as maternal age, type of embryo transfer strategy (fresh transfer or frozen transfer), uterine factors, and severe male factors. The dataset is split in terms of negative and positive results, further moulding the influence of the dataset on the performance of AI models in blastocyst morphological assessment.[15]

This systematic review highlights the need for standardised datasets, as variations in key characteristics hinder meaningful comparisons across AI models. Therefore, this systematic review seeks to address the following research questions: What are the critical differences in dataset characteristics used to train AI models for automated blastocyst assessment, and how could creating a universal dataset standard enable reliable comparisons of the accuracy of various AI models?

MATERIAL AND METHODS

The systematic review methodology was in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.[16] This research question was formulated based on the objectives of this review.

Search Strategy

A comprehensive literature search was conducted between 28th June 2023 and 5th July 2023 in PubMed and Elsevier Library using the keywords ‘Deep Learning’ OR ‘Machine Learning’ OR ‘Artificial Intelligence’ OR ‘Computer Vision’ AND ‘Blastocyst’ OR ‘Embryo.’ An additional search was performed on January 15, 2025, to update the existing literature.

Study Selection

The initial search retrieved 384 studies. The inclusion criteria for the review were articles published in the last 10 years that focused on the most relevant studies, resulting in 362 studies. After excluding 32 duplicates, the remaining studies were screened to include only studies published in English and those based on human blastocysts, leaving 186 studies to be screened.

Four other eligible studies were identified through thorough forward and backwards citation mining of the selected relevant literature by employing the Open Knowledge Maps Tool on 29th August 2024.

Excluded Studies

Articles related to the use of AI models in the assessment of cleavage-stage embryos (day 3 embryos) and oocyte morphology were excluded. Additionally, generic narratives on the use of AI in IVF labs and gamete assessments were excluded. Studies that used AI for the association of clinical parameters and live births with proteomic, transcriptomic, and radiomic models were also excluded. Furthermore, we excluded studies that solely aimed to assess the efficacy of existing AI models, such as the Intelligent Data Analysis (iDA) and Known Implantation Data (KID) scores, on specific datasets annotated by the authors. These studies did not involve training the AI models but rather focused on applying pre-trained models to datasets with clinical or morphological annotations.

Primary selection was performed based on title and abstract screening using the PRISMA guidelines, which resulted in 26 eligible studies.

Data Extraction

Data extraction was independently performed by two reviewers Doel Bose and Sayali Kandari (DB and SK). A consensus was reached through discussion, and a third reviewer Sandeep Kumar Verma (SKV) resolved any persistent disagreements.

A standardised spreadsheet was used for meticulous data extraction, facilitating the systematic collection of essential information from each included study. The key parameters were as follows:

  • Dataset size: Total number of samples/images in each dataset.

  • Span of data collection: Span of data collection in years.

  • Dataset source clinics: Clinic(s) or providers of the dataset.

  • Image capture technique: Static image or time-lapse video.

  • Image resolution: Pixel dimensions or image quality within the captured dataset.

  • Image/video capture device: Camera/device vendor and type.

  • Metadata with image: Maternal age and clinical parameters used to train the AI model.

  • Dataset outcome or ground truth or endpoint: Was the outcome measured in terms of live birth or cardiac activity? If cardiac activity was measured, when was it measured, 6 weeks | 8 weeks?

  • Day of image capture: blastocyst image capture day—D5 or D6, or D7

  • Dataset capture timing: When was the image captured in terms of hpi.

  • Number of focal planes used for image/video capture: The number of focal planes utilised in capturing images or videos of embryos intended for AI model training.

  • Dataset class distribution: The distribution of images across different classes or categories, that is, the percentage of images with a negative outcome and the percentage of images with a positive outcome.

  • Type of embryo transfer: Whether images are related to fresh embryo transfer, frozen embryo transfer (FET), or a combination of both strategies.

  • Public availability of data: Whether the datasets were publicly available for use by other researchers.

Quality Assessment

Two authors independently reviewed the abstracts and excluded those that did not meet the inclusion criteria. The selected papers were assigned to both reviewers [Doel Bose (DB) and Sayali Kandari (SK)], with conflicts resolved by a third author Hemant Deshmukh (HD) through majority voting. Data extraction and quality assessment were conducted independently by two authors, and disagreements were resolved by a third author. Since the majority of studies in our review did not strictly qualify as medical interventions, a more tailored approach was chosen to assess the risk of bias in datasets using predefined criteria for comparison using spreadsheets.

RISK OF BIAS ASSESSMENT

Two independent reviewers (DB and SK) rigorously evaluated the risk of bias in key domains, including the conflicting interests of authors, study design, participant selection, confounding variables, and outcome measurements. Discrepancies were resolved through arbitration and by reviewer Sandeep Kumar Verma (SKV) when needed.

OUTCOME

During the screening phase, both reviewers meticulously examined the abstracts, resulting in the exclusion of 137 studies deemed irrelevant to the research questions. Further scrutiny of the remaining pool led to the assessment of 49 eligible studies. Following a thorough evaluation of the literature abstracts and titles, 22 studies that precisely met the research question criteria were identified.

In addition to the systematic screening process, advanced tools like Research Rabbit and Connected Papers were utilised. These tools, employed to visualise the network of papers, proved instrumental in identifying any potentially relevant studies that might have been overlooked through conventional search methods.

Furthermore, through backwards and forward citation mining using the above tools, four more studies were added, resulting in a total of 26 studies included in this systematic review. The distribution of results at each stage is presented graphically in Figure 1.

PRISMA flow diagram of study selection. Systematic review of use of artificial intelligence for blastocyst assessment. The literature search included studies published since 2013 written in the English language and the ones done on human blastocysts.
Figure 1:
PRISMA flow diagram of study selection. Systematic review of use of artificial intelligence for blastocyst assessment. The literature search included studies published since 2013 written in the English language and the ones done on human blastocysts.

MAIN RESULT

In our review of 26 studies on blastocyst assessment using AI, the dataset sizes varied widely from 160 images to 171,239 images.[17,18] The dataset source showed notable variations, with fifteen studies relying on a single clinic’s data, thus potentially limiting adaptability.

The timing of blastocyst image capture varied among studies. Notably, seven studies did not specify the timing of image capture on day 5; seven studies had a large window for capturing images of blastocysts from D5 to D7 (104–130 hours). In contrast, only seven studies captured blastocyst images at a recommended fixed window between 110 and 116 hours post-insemination to maintain the consistency of input data.[1925] This highlights the need for standardisation and consistency of accurate and comparable blastocyst assessment results.

The distribution of dataset classes and sampling strategies exhibited considerable variation, ranging from the common 40%–50% distribution of good embryos and 50%–60% distribution of poor embryos to an extreme case featuring only 8% good and 92% poor embryos.[26]

In terms of imaging techniques, thirteen studies employed time-lapse only, while eight utilised only optical light microscopy. Most studies that are based on static images have utilised either optical microscope images or static frames from time-lapse videos as a source of images. Two exceptions stand out: Diakiw, S. M[27], used both types of static image sources, while Loewke, K. [15] uniquely included low-resolution static images from a stereo zoom microscope as a third type of image source.

In terms of focal planes, 17 studies employed static images with a single focal plane for the AI model training. Only one study utilised 11 focal planes per image for the evaluation of blastocysts, whereas another study utilised seven focal planes per image to train the AI model, showing diverse approaches in terms of input data.[21,28]

None of the studies explicitly indicated the elimination of confounding factors by excluding females with uterine abnormalities or issues related to uterine receptivity from the datasets used to train AI models. One study based on predicting the euploidy of blastocysts by AI eliminated the confounding uterine factors by only considering live birth and aneuploid miscarriages verified by CVS (Chorionic Villus Sampling).[17]

Seven studies incorporated data from both fresh and FETs, whereas five studies did not specify the type of embryo transfer strategy (fresh or frozen). Of the remaining studies, eight relied exclusively on ground truth data from FET cycles.

The datasets exhibit a variety of ground-truth measurements. Only one dataset employs serum beta-human chorionic gonadotropin (βHCG) levels on day 7 post-embryo transfer as an indicator of early pregnancy, while a number of studies, totalling six, concentrate on foetal heartbeat scans between 6 and 10 weeks.[29] The endpoint of the live birth data was collected from the nine datasets. Additionally, five studies utilised Preimplantation Genetic Testing for Aneuploidy (PGT-A) results, providing a genetic perspective on embryo viability.[19,27,3033]

SYNTHESIS OF RESULTs

Variability in Dataset Size

In our comprehensive review of 26 studies, each study was thoroughly scrutinised to ascertain the actual number of images used to train the AI model, as shown in Table 1.[15,17,18-35] Many papers presented an inflated number in the dataset size, but later in the ‘Material and Methods’ section, many studies mentioned the exclusion of images, as they were of poor quality, or mentioned data augmentation by flipping and rotations. Thus, only filtered images that were fed to the AI model during training or validation without data augmentation were considered for the dataset size.

Table 1
Comparison of dataset size and number of clinics involved in data collection for each review paper.
Sr. no Authors Dataset size Span of data collection Number of clinics involved
1 Loewke et al.[15] 5923 2015–2020 11
2 Miyagi et al.[17] 160 2008–2017 1
3 Chen et al.[18] 1,71,239 2014–2018 1
4 Barnes et al.[19] 10,378 2012–2017 2
5 Khosravi et al.[20] 1764 2012–2017 1
6 Wang et al.[21] 1025 2017–2018 1
7 Bormann et al. [22] 742 Not mentioned 1
8 Bormann et al.[23] 3469 Not mentioned 1
9 Miyagi et al.[24] 5691 2009–2017 1
10 Kanakasabapathy et al.[25] 542 Not mentioned 2
11 Tran et al.[26] 10,638 2014–2018 8
12 Diakiw et al.[27] 1001 2011–2020 10
13 Giscard d’Estaing et al.[28] 854 2013–2017 1
14 Chavez-Badiola et al.[29] 946 2015–2019 3
15 Huang et al.[30] 1803 2018–2019 1
16 Diakiw et al.[31] 5010 2011–2020 10
17 Cimadomo et al.[32] 3604 2013–2022 1
18 Ma et al.[33] 3405 2018–2021 2
19 Onthuam et al.[34] 1194 2018–2022 1
20 Goldmeier et al.[35] 608 2014–2017 3
21 Geller et al.[36] 361 2016–2019 4
22 Berntsen et al.[37] 1,15,832 2011–2019 18
23 Enatsu et al.[38] 19,342 2011–2019 1
24 Liu et al.[39] 17,580 2016–2020 1
25 Yuan et al.[40] 1036 2019–2022 1
26 Huang et al.[62] 15,434 2018–2019 1

A wide range of dataset sizes was observed, ranging from a minimum of 160 images to a maximum of 171,239 images, indicating the diversity of the dataset size, as depicted in Figure 2.[17,18] It is important to note that two studies utilised a dataset size of less than 500 images, whereas two others utilised a dataset size of more than 100,000 images.[17,18,36,37] It is intuitively understood that a more extensive dataset contributes to the enhanced training and performance of AI models in blastocyst assessment.

Dataset size used in studies. Two studies used less than 500 images to train the AI model. Nine studies utilised as many as 1000 images to train the AI model, while 2 studies used greater than 1,00,000 images for training the AI model.
Figure 2:
Dataset size used in studies. Two studies used less than 500 images to train the AI model. Nine studies utilised as many as 1000 images to train the AI model, while 2 studies used greater than 1,00,000 images for training the AI model.

Dataset Demographics and Source

Most studies (15 out of 26) have focused their AI models on data from a single clinic, potentially limiting adaptability, as shown in Table 1.[17,18,2024,28,30,32,3739] Notable exceptions include a study using a diverse multinational dataset (USA, India, Spain and Malaysia) and another incorporating data from 18 clinics across five countries (Denmark, UK, Australia, Japan and Ireland), showing broader geographical coverage as shown in Figure 3.[31,37]

Number of clinics involved in data collection. Of the 26 studies included in the systematic review, 15 collected data from a single clinic. Only three studies collected diverse data that spanned more than one country.
Figure 3:
Number of clinics involved in data collection. Of the 26 studies included in the systematic review, 15 collected data from a single clinic. Only three studies collected diverse data that spanned more than one country.

Variability in Image Capture Mechanism and Resolution

A diverse range of image resolutions (100 pixels × 100 pixels to 1024 pixels × 768 pixels) reflects various equipment choices as shown in Table 2. Notably, 13 studies employed only time-lapse devices, while eight solely utilised optical light microscopes, illustrating technological diversity.[18,25,29,31,36,38,39] The number of focal planes in the time-lapse video ranged from 7 to 11, introducing complexity to the data collection method as shown in Figure 4.

Table 2
Comparison of resolution of image, type of input source, number of focal planes and type of image capture mechanism/camera used for capturing images in various datasets.
Sr.No Authors Resolution of image Type of input to AI model Focal plane(s) Types of camera used
1 Loewke et al.[15] 224 pixels × 224 pixels, 112 pixels × 112 pixels, 56 pixels × 56 pixels Static image, static images from TL, static images from Stereozoom 1 8 types - Olympus IX71, Nikon Diaphot 300, Olympus IX73, Olympus IX70, Nikon Diaphot, NIKON Eclipse TE300, Nikon SMZ 800 (Stereo Zoom), Embryoscope
2 Miyagi et al.[17] 100 pixels × 100 pixels Static image 1 Make not mentioned
3 Chen et al.[18] 264 pixels × 198 pixels Static image 1 1 type - Optical Light Microscope (Zeiss Axio Observer Z1)
4 Barnes et al.[19] 500 pixels × 500 pixels Static image 1 2 types - EmbryoScope, Embryoscope plus
5 Khosravi et al.[20] 500 pixels × 500 pixels Static images from TL 7 1 type -Embryoscope TL
6 Wang et al.[21] 1280 pixels × 960 pixels Static images from TL 11 1 type - ASTEC Time Lapse - CCM-iBIS
7 Bormann et al. [22] Not mentioned Static image 1 1 type - Embryoscope TL
8 Bormann et al.[23] 210 pixels × 210 pixels Static image 1 1 type - Embryoscope TL
9 Miyagi et al.[24] 50 pixels × 50 pixels Static image 1 Optical light microscope. Make not mentioned.
10 Kanakasabapathy et al.[25] Not mentioned Static image 1 1 type - Proprietary Optical System mounted on Smart Phone
11 Tran et al.[26] whole video TL video Multiple 2 types - Embryoscope, Embryoscope plus
12 Diakiw et al.[27] 480 pixels × 480 pixels Static image, static images from TL 1 6 types - Cooper Surgical Saturn5, Saturn3, Vitrolife Octax, Hamilton Thorne LYKOS, EmbryoScope TL,Merck GERI TL
13 Giscard d’Estaing et al.[28] 1000 pixels × 1000 pixels Static images from TL 7 1 type - Embryoscope TL
14 Chavez-Badiola et al.[29] 640 pixels × 480 pixels or 807 pixels × 603 pixels Static image 1 2 types - Olympus IX71, Olympus IX73
15 Huang et al.[30] Not specified (video) TL video Multiple 1 type - Embryoscope Plus TL
16 Diakiw et al.[31] 480 pixels × 480 pixels Static image 1 Optical light microscope. Make not mentioned.
17 Cimadomo
et al.[32]
not available (NA) TL video Multiple 1 type - Embryoscope TL
18 Ma et al.[33] Not specified (video) TL video Multiple 1 type - Embryoscope Plus TL
19 Onthuam et al.[34] 224 pixels × 224 pixels Static image 1 Optical light microscope. Make not mentioned.
20 Goldmeier
et al.[35]
224 pixels × 224 pixels TL video Multiple 1 type - Embryoscope TL
21 Geller et al.[36] 224 pixels × 224 pixels static image 1 Optical light microscope. Make not mentioned.
22 Berntsen et al.[37] 256 pixels × 256 pixels Static image 1 2 types - embryoscope, embryoscope Plus
23 Enatsu et al.[38] 480 pixels × 640 pixels Static image 1 Optical light microscope. Make not mentioned.
24 Liu et al.[39] 1024 pixels × 768 pixels Static images 2 Optical light microscope. Make not mentioned.
25 Yuan et al.[40] Not mentioned TL video Multiple 1 type - Embryoscope Plus TL
26 Huang et al.[62] 224 pixels × 224 pixels Static images from TL 1 1 type - Embryoscope Plus

CCM-iBIS: Cell Culturing Monitoring - Integrated Ballistics Incubation System.

Focal planes used to capture images in datasets. Out of 26 studies, 17 collected images using a single focal plane, i.e., static images, while a few studies relied on capturing images of the same blastocyst at varying focal planes ranging from 2 to 11 focal planes.
Figure 4:
Focal planes used to capture images in datasets. Out of 26 studies, 17 collected images using a single focal plane, i.e., static images, while a few studies relied on capturing images of the same blastocyst at varying focal planes ranging from 2 to 11 focal planes.

In the majority of studies (12 of 26), a single data capture mechanism was used exclusively. Two studies did not specify their image capture technique.[17,24] Only two studies utilised more than one image-capture mechanism. A study by Diakiw, S. M.,[27] predominantly used static images from optical light microscopes for training and validation, reserving static frames from time-lapse videos specifically for testing only, as shown in Figure 5. Loewke, K.[15] incorporated low-resolution images (56 pixels × 56 pixels) from a stereo zoom microscope as part of their training data, while also using static frames from time-lapse videos exclusively to test their model.

Dataset image capture source. Out of 26 studies included in the systematic review, 14 collected images from the time-lapse system, while eight studies relied solely on static images collected from an optical light microscope.
Figure 5:
Dataset image capture source. Out of 26 studies included in the systematic review, 14 collected images from the time-lapse system, while eight studies relied solely on static images collected from an optical light microscope.

Dataset Image Capture Camera Model

A notable observation made while evaluating the camera model of the dataset across studies is shown in Figure 6. Among the 26 studies, more than half (17 out of 26) used a single type of device to capture images, either a single type of optical light microscope or a single type of time-lapse incubator fitted with a single type of camera.[1825,28,30,32,33,35,36,39,40] Exceptions to these findings are two studies that included six and eight camera models, respectively, for collecting input datasets, as shown in Figure 6.[15,27]

Types of image capture mechanisms used in the collection of image datasets. Thirteen studies used images from a single type of optical system while seven did not specifically mention the make or model of the optical system and camera. Two studies utilized six and eight types of image capture mechanisms, as shown.
Figure 6:
Types of image capture mechanisms used in the collection of image datasets. Thirteen studies used images from a single type of optical system while seven did not specifically mention the make or model of the optical system and camera. Two studies utilized six and eight types of image capture mechanisms, as shown.

Dataset Image Capture Timing

The timing of blastocyst image capture presents a significant discrepancy among studies. Thirteen studies lacked specific mention of the image or video capture timing on day 5 or day 6.[15,2628,31,33,36,3840,41] Among those providing the exact timing, seven studies utilise images captured at the exact time range between 110 and 115 hpi, as shown in Table 3.[17,19,2023,25] Additionally, six studies incorporated a wider range of timing, capturing images anywhere between 104 and 140 hours, i.e., day 5 to day 6.[18,24,28,30,32,33,37]

Table 3
Comparison of timing of image capture in terms of hours post insemination and day of embryo capture for datasets.
Sr. no Authors Timing of image capture Day of image capture
1 Loewke et al.[15] Not mentioned D5, D6, or D7
2 Miyagi et al.[17] 115 hpi D5
3 Chen et al.[18] 112–116 hpi (D5), 136–140 hpi (D6) D5, D6
4 Barnes et al.[19] 110 hpi D5
5 Khosravi et al.[20] 110 hpi D5
6 Wang et al.[21] 116 ± 1 hpi D5
7 Bormann et al. [22] 113 hpi D5
8 Bormann et al.[23] 113 ± 0.05 hpi D5
9 Miyagi et al.[24] 115 or 139 hpi D5, D6
10 Kanakasabapathy et al.[25] 113 hpi D5
11 Tran et al.[26] Not mentioned D5
12 Diakiw et al.[27] Not mentioned D5, D6
13 Giscard d’Estaing et al.[28] Not mentioned D5, D6
14 Chavez-Badiola et al.[29] 130–131.8 hpi D6
15 Huang et al.[30] 110–116 hpi (D5), 132.5–136 hpi (D6) D5, D6
16 Diakiw et al.[31] Not mentioned D5
17 Cimadomo et al.[32] Less than 120 hpi = D5, Between 121 and 144 hpi = D6,
Greater than 144 hpi = D7
D5, D6, D7
18 Ma et al.[33] Not mentioned D5, D6
19 Onthuam et al.[34] Not mentioned D5
20 Goldmeier et al.[35] Not mentioned D5
21 Geller et al.[36] Not mentioned D5
22 Berntsen et al.[37] 108–140 hpi D5, D6
23 Enatsu et al.[38] Not mentioned D5
24 Liu et al.[39] Not mentioned D5
25 Yuan et al.[40] Not mentioned D5, D6
26 Huang et al.[62] 105–125 hpi D5, D6

Only 7 out of 26 studies used a fixed time frame on day 5 with respect to hpi to capture blastocyst images, as shown in Figure 7. One study utilised a variable timeframe for image capture, anywhere between days 5, 6, or 7, which raises questions about the quality of the dataset.[15]

Day and timing of image capture of datasets. Only 7 out of 26 studies use a fixed time frame with respect to hours post insemination (hpi) to capture a blastocyst image.
Figure 7:
Day and timing of image capture of datasets. Only 7 out of 26 studies use a fixed time frame with respect to hours post insemination (hpi) to capture a blastocyst image.

Dataset Endpoints or Ground Truth

In the majority of studies included in this systematic review, the correlation between the dataset endpoint and the intended objective of blastocyst assessment was notable, as shown in Table 4. An exception worth noting was the study which focused on ranking embryos based on their implantation potential.[29] Here, the premature endpoint involved measuring βHCG at day 7 post-embryo transfer. It is worth noting that a more optimal approach might have been assessing βHCG levels at a later date or monitoring foetal heartbeat at 6–8 weeks for a more accurate evaluation. The bar chart, as shown in Figure 8, represents the endpoints, or ground truth, opted in various studies included within this systematic review.

Table 4
Comparison of sampling strategy in terms of distribution of negative and positive class and tracked endpoint of the datasets.
Sr. no Authors Class distribution
(% negative: % positive)
Primary aim of blastocyst assessment Endpoint/ground truth
1 Loewke et al.[15] Not clear Ranking blastocyst based on morphology FHB at 6–8 weeks
2 Miyagi et al.[17] 50% : 50% Predict probability for live birth ( distinguish between normal category and abortion category) Live birth or aneuploid abortion verified by CVS sampling
3 Chen et al.[18] NA Automated grading of blastocyst Blastocyst grading
4 Barnes et al.[19] 57% : 43% Prediction of human blastocyst ploidy PGT-A result, FHB for non PGT-A embryo transfers
5 Khosravi et al.[20] 50% : 50% Assess blastocyst quality. Identify good-quality and poor-quality images Good vs. bad embryo classification
6 Wang et al.[21] 49% : 51% Comparing blastocyst quality evaluation using multifocal images Blastocyst classification
7 Bormann et al. [22] Not clear Scoring embryo to make disposition decisions for biopsy, freeze or discard Embryo selection for biopsy, cryopreservation or discard
8 Bormann et al.[23] 54% : 44% Identify blastocyst capable of implantation FHB
9 Miyagi et al.[24] 72% : 28% Evaluate likelihood of clinical pregnancy Live birth
10 Kanakasabapathy et al.[25] Not clear Assessment of blastocyst morphology for classification of embryo Blastocyst vs. non blastocyst classification
11 Tran et al.[26] 92% : 8% Foetal heartbeat prediction from Time Lapse video without morphokinetic annotation FHB at 7 weeks
12 Diakiw et al.[27] 41% : 59% Predict human embryo ploidy status using static images PGT-A result
13 Giscard d’Estaing et al.[28] 28% : 72% Live birth prediction from blastocyst scoring Live birth
14 Chavez-Badiola et al.[29] 56% : 44% Predict ploidy (viability) and implantation using static blastocyst images Serum βHCG on day 7 of ET
15 Huang et al.[30] Not clear Predict embryo ploidy status based on timelapse data PGT-A result
16 Diakiw et al.[31] 43% : 57% Likelihood of clinical pregnancy PGT-A result
17 Cimadomo et al.[32] Not clear Embryo grading and Euploidy prediction Live birth
18 Ma et al.[33] 43% : 44% : 12% Embryo selection using AI model trained with metadata PGT-A result
19 Onthuam et al.[34] 31% : 69% Predict embryo aneuploidy FHB at 6–10 weeks
20 Goldmeier et al.[35] 67% : 32% Predict embryo implantation potential by automated calculation of morphometric parameters FHB at 6–8 weeks
21 Geller et al.[36] 42% : 58% Predicting whether an embryo will lead to a pregnancy and predict outcome of that pregnancy Live birth
22 Berntsen et al.[37] 71% : 29% Correlation between morphokinetics and AI assessment FHB at 7 weeks
23 Enatsu et al.[38] 61% : 39% Comparison of live birth prediction using image only with image + clinical data ensemble AI model Live birth
24 Liu et al.[39] 64% : 36% Live birth prediction model based on image and clinical data Live birth
25 Yuan et al.[40] 47% : 53% Automated blastocyst quality assessment with multifocal images ( good vs poor) Live birth
26 Huang et al.[62] 54% : 46% Predict the probability of live birth from timelapse data Live birth

PGT-A: Preimplantation genetic testing for aneuploidy, FHB: Foetal heartbeat, βHCG: Beta human chorionic gonadotropin, ET: Embryo transfer, CVS: Chorionic villus sampling, TL: Time lapse.

Ground truth or end points of studies. Only one study used serum βHCG as end point while nine studies tracked patients till live birth. βHCG: Beta human chorionic gonadotropin, PGT-A: Preimplantation genetic testing for aneuploidy, FHB: Foetal heartbeat.
Figure 8:
Ground truth or end points of studies. Only one study used serum βHCG as end point while nine studies tracked patients till live birth. βHCG: Beta human chorionic gonadotropin, PGT-A: Preimplantation genetic testing for aneuploidy, FHB: Foetal heartbeat.

Another noteworthy study in the review centres on blastocyst assessment aimed at associating a probability score for live birth. In the study by Miyagi Y,[17] the endpoint was defined by measuring live births or tracking aneuploid abortions verified by CVS, providing a robust method as compared to relying solely on negative βHCG or missing cardiac activity.[17]

Dataset Cleansing – Removing Confounders

Most of the studies associated with blastocyst assessment aimed for either viability or predictive score. These studies did not consider filtering the datasets to remove confounding factors like uterine abnormalities or receptivity that could influence the clinical endpoint of foetal heartbeat at 6–8 weeks.

Only one study excluded confounders by only considering aneuploid miscarriages as the negative class, eliminating various confounding factors related to uterine abnormalities.[17]

Dataset Sampling Strategy and Class Imbalance

Considerable variation in dataset class distribution was observed among the included studies, as detailed in Table 4. A realistic distribution of close to 40% good embryos and 60% poor embryos was commonly employed in 9 out of 26 studies, as shown in Figure 9, while seven studies approached an even 50%–50% split.

Dataset class distribution—percentage of negative class vs. positive class. Notably, five studies did not have a clear mention of the class distribution. Seven adopted a class distribution of 50%–50% between negative and positive classes, while another nine opted for a distribution closer to reality, with 60% negative and 40% positive classes.
Figure 9:
Dataset class distribution—percentage of negative class vs. positive class. Notably, five studies did not have a clear mention of the class distribution. Seven adopted a class distribution of 50%–50% between negative and positive classes, while another nine opted for a distribution closer to reality, with 60% negative and 40% positive classes.

However, one study employed an unusually imbalanced class distribution, opting for an 8% positive and 92% negative split.[26] This emphasises the lack of standardised common practices in defining the composition of training datasets and their sampling strategies.

Embryo Transfer Strategy: Fresh vs. Frozen Transfer

Including data from both fresh and FETs in seven studies introduces a confounder to the predictive model, as shown in Table 5.[15,22,2629,37] It is widely acknowledged that fresh embryo transfers tend to have a lower pregnancy rate than FET cycles, primarily due to compromised uterine receptivity.[42] Studies focused on predicting embryo viability and euploidy logically opted for FET as their preferred embryo transfer strategy.[19,27,30,31] This choice aligns with the nature of these studies, which relied on predicting euploidy outcomes, where the endpoint was the PGT-A result.

Table 5
Comparison of embryo transfer strategy used and metadata used in datasets.
Sr. no Authors Embryo transfer strategy Meta data inclusion in AI model training
1 Loewke et al.[15] Both fresh ET and FET Maternal age, morphokinetic parameters
2 Miyagi et al.[17] Not mentioned Maternal age, clinical parameters
3 Chen et al.[18] Not applicable, only blastocyst assessment Not applicable
4 Barnes et al.[19] FET Maternal age, morphokinetic parameters
5 Khosravi et al.[20] Not applicable, differentiation between good & bad Not applicable
6 Wang et al.[21] Not applicable, only blastocyst assessment Not applicable
7 Bormann et al. [22] Not applicable, only blastocyst assessment Not applicable
8 Bormann et al.[23] Both fresh ET and FET Not clear
9 Miyagi et al.[24] Not mentioned Morphological parameters, clinical parameters
10 Kanakasabapathy et al.[25] Not applicable, only morphology assessment Not applicable
11 Tran et al.[26] Both fresh ET and FET None
12 Diakiw et al.[27] FET None
13 Giscard d’Estaing et al.[28] Both fresh ET and FET Maternal age, morphokinetic parameters
14 Chavez-Badiola et al.[29] Both fresh ET and FET Maternal age, hours post insemination
15 Huang et al.[30] FET Maternal age, morphokinetic parameters
16 Diakiw et al.[31] FET None
17 Cimadomo et al.[32] FET None
18 Ma et al.[33] FET Maternal age, morphokinetic parameters
19 Onthuam et al.[34] Not mentioned Maternal age and istanbul grading scores (stage, ICM, and TE)
20 Goldmeier et al.[35] Fresh ET Maternal age, morphokinetic parameters
21 Geller et al.[36] Not mentioned None
22 Berntsen et al.[37] Both fresh ET and FET None
23 Enatsu et al.[38] Not mentioned Maternal age, clinical parameters
24 Liu et al.[39] FET Maternal age, clinical parameters
25 Yuan et al.[40] FET Morphokinetic, morphological and clinical parameters
26 Huang et al.[62] Both fresh ET and FET None

ET: Embryo Transfer, FET: Frozen Embryo Transfer, ICM: Inner Cell Mass, TE: Trophectoderm.

Five studies do not clearly mention the embryo transfer strategy, as shown in Figure 10. Since the endpoints of these studies were the classification or grading of blastocysts, the embryo transfer strategy was not applicable.[18,21,20,22,25]

Embryo transfer strategy employed in datasets. Seven studies included data from both fresh and frozen transfers, while five failed to mention the embryo transfer strategy.
Figure 10:
Embryo transfer strategy employed in datasets. Seven studies included data from both fresh and frozen transfers, while five failed to mention the embryo transfer strategy.

Dataset Inclusion and Exclusion Criteria

In a study by Loewke, we recorded the inclusion of discarded or mislabelled blastocysts in the negative outcome class.[15] However, assuming negative outcomes for all discarded blastocysts might be flawed, as supernumerary blastocysts are often discarded without known outcomes, especially when patients withhold consent for freezing.

Four studies have failed to clarify the embryo transfer strategy, i.e., fresh or FET, employed in the datasets collected, further impacting the IVF cycle outcome.[17,24,36,38] Including such data has the potential to influence model training and overall performance.

Achieving a balance between inclusion and exclusion criteria is a pivotal factor in constructing a dataset that accurately represents the intended predictive endpoint.

Inclusion of Metadata in Training AI Model

Five studies that concentrated on assigning a grade to blastocysts or categorising them as good or poor consciously chose not to incorporate any metadata, as shown in Figure 11.[18,21,22,25,32] This decision aligns with the nature of their task, in which the role of metadata may not be considered significant.

Different types of metadata used along with the blastocyst image. Out of 13 studies that used metadata, two used only maternal age. Notably, three studies used other clinical parameters apart from maternal age. ICM: Inner Cell Mass, TE: Trophectoderm.
Figure 11:
Different types of metadata used along with the blastocyst image. Out of 13 studies that used metadata, two used only maternal age. Notably, three studies used other clinical parameters apart from maternal age. ICM: Inner Cell Mass, TE: Trophectoderm.

Surprisingly, few studies that specifically target predicting embryo viability and implantation potential have omitted the use of patient metadata altogether.[17,26,31,36,37] However, a noteworthy study employed a comprehensive approach to train the AI model by incorporating a range of clinical parameters such as maternal age, number of embryo transfers, anti-Müllerian hormone concentration, day-3 blastomere number, grade on day 3, embryo cryopreservation day, ICM, TE, average diameter, and body mass index (BMI), etc. over and above the morphological parameters of the embryo as shown in Figure 11.[24] A more recent model also integrated maternal age with pseudo-features derived from Istanbul grading to enhance predictive performance.[34] Notably, automated extraction of morphometric parameters has demonstrated a positive association with implantation potential.[35]

Another recent study employed a deep learning pipeline using day-5 embryo images along with maternal age and pseudo-features derived from Istanbul grading scores (stage, ICM, and TE), combining them with self-supervised and Generative Adversarial network (GAN)-augmented models to enhance viability prediction. This approach reflects a growing trend toward hybrid models that merge visual and clinical data to improve predictive accuracy.[34]

Additionally, seven studies that utilise time-lapse images and videos also integrated embryo morphokinetics data into their model training, highlighting the importance that these parameters provide, as shown in Figure 11, recognising the valuable insights these parameters provide. This diversity in metadata utilisation highlights the need for standardised practices and considerations in optimising models for predicting implantation potential.

Among studies using metadata for AI training, three studies utilised maternal age with other clinical data, as shown in Figure 11.[29,38,39] Four studies relied on morphokinetic data.[22,23,28,37] This diverse approach showcases ongoing experimentation to boost AI models for predicting embryo outcomes, logically emphasising that the use of metadata should enhance the model accuracy.

Public Availability of Data

Of the 26 studies reviewed, a significant portion lacked transparency regarding the dataset availability. Specifically, 14 studies did not explicitly mention whether their datasets were accessible to other researchers. Among the remaining 12, seven studies indicated that anonymised datasets can be obtained upon request, reflecting a willingness to share data while ensuring participant privacy.[17,22,30,37,39,40] However, four studies explicitly denied public access to their embryo images and patient data, citing ethical and privacy reasons and limiting access to collaborators.[19,20,27,33] This disparity in data availability highlights the ongoing challenges of balancing data sharing with ethical considerations in reproductive research.

A consolidated overview of dataset characteristics, including but not limited to study type, dataset size, span of data collection, image resolution, timing of image capture, metadata inclusion, and clinical endpoints, is presented in Table 6, reflecting the wide heterogeneity observed across the 26 included studies.

Table 6
Consolidated overview of dataset characteristics in studies on AI-based blastocyst assessment.
Title Span of data collection Dataset size Image capture timing Number of clinics involved Image capture source Image format Types of camera used Resolution of image Number of focal planes Metadata used by model ? Dataset sampling
(% negative vs % positive class)
Et strategy Ground truth Public data availability
1. Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF 2011 to 2020 5010 Not mentioned (D5) 10 Optical light microscope Static Optical light microscope. Make not mentioned. 480 pixels x 480 pixels 1 No 42% : 58% FET PGT-A result Not mentioned
2. Robust and generalizable embryo selection based on artificial intelligence and time-lapse image sequences 2011 to 2019 115832 Appx 108 to 140 hpi
( D5, D6)
18 Time lapse Static 2 Types - Embryoscope, Embryoscope Plus 256 pixels x 256 pixels 1 Yes (morphokinetic annotations) 71% : 29% Mix FH at 7 weeks Restricted access - anonymized available on request
3. Deep learning as a predictive tool for foetal heart pregnancy following time-lapse incubation and blastocyst transfer 2014 to 2018 10638 Not mentioned (D5) 8 Time lapse Video 2 Types - Embryoscope, Embryoscope Plus Whole video Multiple No 92% : 8% Mix FH at 7 weeks Not mentioned
4. Characterization of an artificial intelligence model for ranking static images of blastocyst stage embryos. 2015 to 2020 5923 D5, D6, D7 11 Optical light microscope + TL Static 8 Types - Olympus IX71, Nikon Diaphot 300, Olympus IX73, Olympus IX70, Nikon Diaphot, NIKON Eclipse TE300, Nikon SMZ 800 (Stereo Zoom), Embryoscope 224 pixels x 224 pixels, 112 pixels x 112 pixels, 56 pixels x 56 pixels 1 Yes (maternal age,
and day of image capture)
Not clear Mix FH 6 to 8 weeks Not metioned
5. A non-invasive artificial intelligence approach for the prediction of human blastocyst ploidy - a retrospective model development and validation study 2012 to 2017 10378 110 hpi 2 Time lapse Static 2 Types - EmbryoScope, Embryoscope Plus 500 pixels x 500 pixels 1 Yes (morphokinetic annotations, maternal age) 57% : 43% FET FH 6 to 8 weeks Not publicly available
6. Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization 2012 to 2017 1764 110 hpi 1 Time lapse Static 1 Type -Embryoscope TL 500 pixels x 500 pixels 7 Yes (maternal age) 50% : 50% NA live birth Not publicly available
7. Consistency and objectivity of automated embryo assessments using deep neural networks Not mentioned 742 113 hpi 1 Time lapse Static 1 Type - Embryoscope TL Not mentioned 1 Yes (morphokinetic annotations) Not clear NA Embryo grade Not mentioned
8. Performance of a deep learning based neural network in the selection of human blastocysts for implantation Not mentioned 3469 113 + - 0.5 hpi 1 Time lapse Static 1 Type - Embryoscope TL 210 pixels x 210 pixels 1 Yes (morphokinetic annotations) 52% : 48 % Mix FH at 6 weeks Restricted access - anonymized available on request
9. A machine learning system with reinforcement capacity for predicting the fate of an ART embryo 2013 to 2017 854 Not mentioned (D5, D6) 1 Time lapse Static + video 1 Type - Embryoscope TL 1000 pixels x 1000 pixels Multiple Yes (morphokinetic annotations) 29 % : 71% FET Live birth Not mentioned
10. Towards Automation in IVF: Pre-Clinical Validation of a Deep Learning-Based Embryo Grading System during PGT-A Cycles 2013 to 2022 3604 D5, D6, D7 1 Time lapse Video 1 Type - Embryoscope TL NA (Not available) Multiple No Not clear FET live birth Not mentioned
11. A novel system based on artificial intelligence for predicting blastocyst viability and visualizing the explanation 2011 to 2019 19342 D5 or D6 1 Optical light microscope Static Optical light microscope. Make not mentioned. 480 pixels x 640 pixels 1 Yes (maternal age, clinical parameters) 61% : 39% Not mentioned FHB Not mentioned
12. Embryo Ranking Intelligent Classification Algorithm (ERICA): artificial intelligence clinical assistant predicting embryo ploidy and implantation 2015 to 2019 946 130.0 to 131.8 hpi (D6) 3 Optical light microscope Static 2 Types - Olympus IX71, Olympus IX73 640 pixels x 480 pixels or 807 pixels x 603 pixels 1 Yes (maternal age, hpi) 44% : 56% Not mentioned βHCG Not mentioned
13. Feasibility of artificial intelligence for predicting live birth without aneuploidy from a blastocyst image 2008 to 2017 160 115 hpi 1 Not mentioned Static Make not mentioned 100 pixels x 100 pixels 1 No 50% : 50% FET live birth Not mentioned
14. Development and evaluation of inexpensive automated deep learning-based imaging systems for embryology Not mentioned 542 113 hpi 2 Optical light microscope Static 1 Type - Proprietary optical system mounted on smart phone Not mentioned 1 No Not clear NA embryo grade Not mentioned
15. Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study 2016 to 2020 17580 Not Mentioned (D5) 1 Optical light microscope Static Optical light microscope. Make not mentioned. 1024 pixels x 768 pixels 2 Yes ( maternal age, clinical parameters) 64% : 36% FET live birth Restricted access - anonymized available on request
16. A Deep Learning Framework Design for Automatic Blastocyst Evaluation With Multifocal Images 2017 to 2018 1025 116 + - 1 hpi 1 Time lapse Static 1 Type - ASTEC Time lapse - CCM-iBIS 1280 pixels x 960 pixels 11 No 51% : 49% NA embryo grade Not mentioned
17. Predicting a live birth by artificial intelligence incorporating both the
blastocyst image and conventional embryo evaluation parameters
2009 to 2017 5691 115h or 139h 1 Not mentioned Static Optical light microscope. Make not mentioned. 50 pixels x 50 pixels 1 Yes ( maternal age, 10 clinical parameters) 28%: 72% Not Mentioned live birth Not mentioned
18. Using deep learning to predict the outcome of live birth from more than 10,000 embryo data 2018 to 2019 15434 105 hpi to 125 hpi (D5) 1 Time lapse Static 1 Type - Embryoscope Plus 224 pixels x 224 pixels 1 No 54% : 46 % Mix Live birth Anonymyzed available on request
19. Using Deep Learning with Large Dataset of Microscope Images to Develop an Automated Embryo Grading System 2014 to 2018 171239 112 to 116 hours (Day 5) or 136 to 140 hours (Day 6) 1 Optical light microscope Static (1 image - 1 Focal plane) 1 Type - Optical Light Microscope (Zeiss Axio Observer Z1) 264 pixels x 198 pixels 1 No 43% : 57% NA Embryo grade Not mentioned
20. An artificial intelligence model (euploid prediction algorithm) can predict embryo ploidy status based on timelapse data 2018 to 2019 1803 D5 - 110-116 h D6 - 132.5-136 h 1 Time lapse Video 1 Type - Embryoscope Plus TL Not specified (video) Multiple Yes (morphokinetic annotations, maternal age) Not clear FET PGT-A result Anonymyzed available on request
21. Development of an artificial intelligence based model for predicting the euploidy of blastocysts in PGT-A treatments 2019 to 2022 1036 D5, D6 1 Time lapse Video 1 Type - Embryoscope Plus TL Not mentioned Multiple Yes (morphokinetic annotations, morphological parameters,maternal age) 51% : 49% NA Embryo grade Anonymyzed available on request
22. An Artificial IntelligenceBased Algorithm for Predicting Pregnancy Success Using Static Images Captured by Optical Light Microscopy 2016 to 2019 361 Not Mentioned (D5) 4 Optical light microscope Static Optical light microscope. Make not mentioned. 224 pixels x 224 pixels 1 No 42% : 48% Not Mentioned FHB Not applicable
23. Development of an artificial intelligence model for predicting the likelihood of human embryo euploidy based on blastocyst images from multiple imaging systems during IVF 2011 to 2020 1001 Not Mentioned
(D5, D6)
10 Optical light microscope + TL Static 6 Types - Cooper Surgical Saturn5, Saturn3, Vitrolife Octax, Hamilton Thorne LYKOS, EmbryoScope TL, Merck GERI TL 480 pixels x 480 pixels 1 Yes ( maternal age) 22% : 78% FET PGT-A result Not publicly available
24. Combined Input Deep Learning Pipeline for Embryo Selection for In Vitro Fertilization Using Light Microscopic Images and Additional Features 2018 to 2022 1194 Not Mentioned (D5) 1 Optical light microscope Static Optical light microscope. Make not mentioned. 224 pixels x 224 pixels 1 Yes ( maternal age + Istanbul Grading Exp, ICM, TE) 69% : 31% Not Mentioned FHB at 6 to 10 weeks Anonymyzed available on request
25. Enhancing clinical utility-deep learning-based embryo scoring model for non-invasive aneuploidy prediction 2018 to 2021 3405 D5,D6 2 Time lapse Video 1 Type - Embryoscope Plus TL Not specified (video) Multiple Yes (maternal age) 44 % Aneuploid : 56% ( 45% Euploid + 12 % Mosaic) FET PGT-A result Not applicable
26. An artificial intelligence algorithm for automated blastocyst morphometric parameters demonstrates a positive association with implantation potential 2014 to 2017 608 D5 Variable (tEB) 3 Time lapse Video 1 Type - Embryoscope TL 224 pixels x 224 pixels Multiple Yes (maternal age) 67.1% : 32.9% Fresh FHB Not publicly available

Comparative summary of dataset characteristics extracted from 26 studies evaluating AI-based blastocyst assessment. The table includes parameters such as dataset size, span of data collection, number of clinics involved in data collection, type of study, public data availability, resolution of image, type of input to the AI model, number of focal planes, types of camera used, timing of image capture, day of image capture, embryo transfer strategy, metadata inclusion in AI model training, primary aim of blastocyst assessment, and the endpoint or ground truth used. This table is provided as a separate Excel file due to formatting constraints and is submitted as Table 6.

DISCUSSION

Importance of Dataset Size

The impact of the sample size on the performance of AI models for blastocyst assessment is significant and multifaceted. Although a larger sample size intuitively correlates with improved model performance, the quality and distribution of data within the dataset are of paramount importance.[43] Studies have shown that satisfactory model performance can be achieved even with a smaller sample size that contains high-quality data and is representative of the problem domain.[44] However, it is essential to recognise that there exists a point of diminishing returns where the effect sizes and ML accuracies plateau after reaching a certain sample size threshold.[45] This suggests that simply increasing the sample size may not always lead to significant improvements in model performance. Therefore, when evaluating sample size adequacy, it is crucial to consider the quantity, quality, and sampling strategy of data to ensure optimal model performance in blastocyst assessment studies, while also acknowledging the resource constraints involved in obtaining and annotating large datasets.

Importance of Diversity in Training Data

Diverse demographic dataset sources are crucial for robust and universal AI models for blastocyst assessment. Training on varied datasets goes beyond regional boundaries, ensuring adaptability to global populations and IVF laboratory settings. AI models benefit from capturing diverse blastocyst morphologies and morphokinetic trends owing to patient ethnicity, enhancing their applicability across different clinical settings worldwide.[46,47]

These findings set the stage for a broader discussion of the implications of dataset demographics, emphasising the need for standardised approaches in selecting datasets for AI model training.

In addition to data from diverse demographics, incorporating various image capture mechanisms is intuitively important. While training with diverse image-capturing mechanisms may seem important, there are strategies such as domain adaptation and domain generalisation that can be employed to address the issue of domain shift and enable seamless integration of these AI models into real-world clinical settings.

Domain adaptation enables models trained in one setting to perform well in others, whereas domain generalisation aims to maintain model performance across multiple domains without specific adaptation.[48,49] Personalised federated learning allows for the training of a shared model with data from multiple clinics, which can then be tailored to individual practices.[50] These strategies ensure that AI models are versatile and customisable in various clinical settings.

Importance of Using Multiple Focal Planes for Image Capture

Although it may seem intuitive that AI models for embryo assessment could benefit from using multiple focal planes to better visualise key characteristics, such as the ICM and TE, the current research does not provide a solid basis for this claim. In a recent abstract that explored the performance of an AI model trained on static images using a single focal depth, there was no statistically significant difference in model performance when applied to multifocal time-lapse images with different focal planes.[50,51] Another abstract mentioned that techniques like ensemble learning and test-time augmentation can also be used to reduce the sensitivity of AI models to different focal planes while maintaining model performance.[52]

However, a recent retrospective study of 2555 day 5 blastocysts showed that training the AI model with segmented images highlighting the ICM and TE improved its predictive performance, with the AUC increasing from 0.716 to 0.741.[41] The model was also able to focus on clinically relevant regions in 99% of cases, compared to 86% when trained on unsegmented images. This suggests that providing structured input emphasising biologically important regions can enhance the accuracy of AI models in embryo assessment.

Hence, we currently do not have sufficient evidence to suggest that multifocal images will be better for training AI models, and this needs further research with larger sample sizes.

Implications of Wider Range in Image Capture Timing

Blastocysts are highly metabolically active, and even a slight 3–5 hour gap in assessment timing has the potential to significantly impact their grading.[42,53] None of the reviewed papers, especially those using static blastocyst images, explicitly specified whether they incorporated the timing of image capture as temporal metadata along with the input image to train their AI models tasked with assessment and ranking blastocysts. It is worth noting that the Istanbul consensus recommends blastocyst assessment to be performed at 116 ± 1 hours post-insemination.[5]

The importance of this parameter becomes evident when considering scenarios such as comparing a grade 4AA blastocyst on day 5 with a similar grade 4AA blastocyst on day 6. Associating these with a similar score would be inaccurate. For optimal training of AI models in blastocyst assessment, the ideal dataset should use images captured within a consistent time frame, ensuring greater accuracy and comparability of the results.

Ideal Endpoint or Ground Truth for Datasets Used for Blastocyst Assessment

The choice of endpoint in blastocyst assessment studies depends heavily on the primary aim of assessment. When the goal is to rank and prioritise blastocysts for transfer, an ideal endpoint would be the detection of cardiac activity, as it provides a more immediate indicator of viability compared to live birth, which is influenced by numerous confounding factors.[54]

In studies aimed at predicting euploidy, the choice of endpoint is complex. Live birth serves as a comprehensive endpoint, encapsulating the ultimate goal of achieving a successful pregnancy. However, for cases resulting in abortion, assessing the genetic status of the conceptus using CVS can offer valuable insights. By doing so, we can gain a more accurate understanding of the predictive capabilities of the model.

Importance of Addressing Confounders in AI Model Training

Maternal factors such as very advanced maternal age (age > 40 years) and high BMI, and uterine factors such as adenomyosis, endometriosis, and the presence of uterine structural anomalies significantly affect the clinical pregnancy and live birth outcomes.[5557] It’s crucial to acknowledge that neglecting these confounders during dataset preparation may introduce biases that affect the robustness of AI model learning.

Training an AI model without accounting for these confounders can negatively affect its performance. For example, if an AI model learns from cases where a blastocyst with favourable morphology does not result in pregnancy due to underlying uterine abnormalities, it may lead to incorrect generalisations. This failure to consider the broader clinical context can impair the AI model’s understanding of the true relationship between embryo quality and outcomes. Such misleading data are labelled as ‘noisy data’ and have the ability to impact model performance.[58]

Excluding confounders is vital not only for ensuring that the dataset represents clinical reality accurately but also for optimising the learning process of AI models, allowing them to capture the various factors associated only with blastocyst morphology that can influence reproductive outcomes.

Importance of Dataset Split or Class Distribution

Highlighting the significance of mirroring real-world scenarios in dataset class distribution, it is crucial to note that, typically, close to 40% of embryos are classified as good usable blastocysts, whereas close to 60% are categorised as poor blastocysts.[59] A recent study suggested that altering the class distribution from 30:70 to 50:50 does not significantly affect the AI model performance.[37] However, another study cautioned against oversampling the negative class (poor embryos), as it introduces a notable negative bias in the AI model.[15] Thus, advocating for a closer-to-reality class distribution of approximately 40% good and 60% poor seems logical for robust AI training.

Deciding on the correct dataset split for training AI models in blastocyst assessment is a critical consideration. Here is a concise overview of the pros and cons associated with a 50%–50% split and a more practical 40%–60% split.

50%–50% Split of Good and Bad Blastocyst Data

A 50%–50% split ensures a balanced representation of positive and negative outcomes, simplifying the learning environment. Evaluation metrics, such as accuracy and precision, are straightforward because of the equal class distribution. However, this approach may be misaligned with the prevalent blastocyst distribution in practical IVF labs, which is 30%–40% good usable blastocysts and 60–70% poor blastocysts.[59] This could potentially hinder the real-world applicability of the model. It may also impact the sensitivity-specificity balance of the model.

Sensitivity refers to the AI’s ability to correctly identify a good blastocyst when it is indeed good, whereas specificity denotes its ability to accurately identify a poor blastocyst when it is indeed poor. If the training data does not reflect real-world distribution, the AI model may become biased. For example, the model might develop a high sensitivity for detecting good blastocysts but exhibit lower specificity in identifying poor blastocysts. This imbalance arises because model predictions are influenced by the characteristics of the training data, leading to potential performance discrepancies between different classes.

40% Good – 60% Bad Split

Opting for a 40%–60% or 30%–70% split aligns more realistically with the actual prevalence of blastocyst grades in practical IVF labs, which is close to the competence value of a good blastocyst development rate for most IVF labs worldwide.[59] This enhances the relevance of the model to clinical practice by providing a distribution that better mirrors real-world scenarios. Additionally, a more representative split improves the model’s sensitivity for positive outcomes, capturing patterns related to the minority class (good blastocysts). However, this approach introduces challenges in model evaluation, particularly with respect to accuracy interpretation, and there is a potential for an increased risk of false positives.[60] It is important to note that an imbalanced class distribution, if present, can contribute to overfitting, where the model may become biased towards more prevalent grades during training, potentially impacting its generalisation to diverse blastocyst grades in real-world applications.

Importance of Embryo Transfer Strategy in Studies

In studies aiming to associate a predictive score with embryos or assess embryo viability during blastocyst assessment, the choice of embryo transfer strategy used during dataset collection has the potential to impact AI model performance. If the AI model is trained solely on fresh transfers, various confounders, like uterine receptivity and endometrial thickness at the time of transfer, may affect the ground truth. Failure to account for these confounders can introduce bias during model training. Therefore, advocating for an ideal strategy, it is suggested that using only FETs for dataset collection, avoiding a mix of fresh and FETs, could help mitigate confounding factors and enhance model robustness.

Importance of Using Clinical Metadata Along with Images

In the context of metadata utilised by models, it is essential to acknowledge that the implantation potential of an embryo is influenced by various factors, with maternal age being one of the most significant contributors.[61] Models aimed at predicting implantation or assigning a viability score can greatly benefit from incorporating patient age and other relevant clinical parameters into their analysis.

Proposed Characteristics of a Gold Standard Dataset

A gold standard dataset for AI-based blastocyst assessment should be designed to support robust training, validation, and testing of clinically applicable models. Based on insights from existing AI tools, such a dataset should ideally include between 50,000 and 100,000 high-quality images of expanded blastocysts, sourced from multiple clinics to ensure diversity and generalisability. A minimum threshold of 20,000 well-curated images may serve as a baseline when image quality, patient selection and imaging consistency are tightly maintained. These images should be divided into 70% for training, 15% for validation and 15% for testing the AI model.[62]

The dataset should consist exclusively of expanded blastocysts from autologous IVF cycles, with one embryo per image and no other embryos, microtools, or visual distractions in the frame. Each image must clearly display the entire blastocyst in sharp focus, with identifiable ICM, TE and zona pellucida. Image capture should be performed at a consistent time point between 110 and 116 hpi using a 20x objective lens on an inverted microscope, resulting in a total magnification of approximately 200 times. A minimum image resolution of 500 by 500 pixels should be maintained. The dataset should include patients aged between 23 and 38 years with normal BMI and exclude cases with uterine abnormalities such as endometriosis, adenomyosis, congenital anomalies, or recurrent implantation failure. Only frozen cycles should be included to eliminate variability due to endometrial receptivity. The ground truth for each image should be defined as the presence or absence of foetal cardiac activity between 6 and 8 weeks of gestation.

All images should be annotated with clinical outcome labels for classification purposes. In addition, a smaller subset of 500–2000 images should be structurally annotated using standard annotation tools to clearly mark the ICM, TE and zona pellucida. These detailed annotations are essential for training segmentation models or for developing region-focused explainable AI tools. A class distribution of approximately 40% positive and 60% negative outcomes should be maintained to reflect real world clinical conditions.

LIMITATIONS

This systematic review has few limitations, including the potential risk of bias in the selected studies. Not using established risk-of-bias assessment tools introduces uncertainty into the analysis. Despite efforts to mitigate bias through tailored dataset assessments, the absence of standardised tools warrants acknowledgement by relying on individual study quality and transparency.

Another limitation is the review’s exclusive focus on the English language literature, potentially introducing language bias. This choice may exclude relevant studies in other languages, thus limiting the representation of diverse perspectives and findings. Consideration of this bias is crucial for interpreting the comprehensive scope and generalisability of the review.

The lack of a standardised parameter set for comparing databases is a significant limitation. Most reviewed studies lack comprehensive dataset reporting and vary in the parameters covered. This reporting variability poses challenges to the synthesis of information across studies.

Additionally, the diverse endpoints pursued in the reviewed studies, ranging from ranking embryos to predicting live births, complicate the dataset comparison. Varying endpoints hinder direct parallels among datasets, adding complexity to the analysis.

This review was limited to various dataset characteristics and their importance in model training. Ideally, a comprehensive study would compare these characteristics and assess their impact on the AI model’s performance. However, this task is complex and challenging because of the diverse nature of datasets, variability in model architectures, and multifactorial interactions among these elements. Future studies should build on our findings and focus on quantifying how specific dataset parameters influence the model performance.

CONCLUSION

In conclusion, our systematic review emphasises the urgent need for standardised approaches to dataset preparation for AI models used for blastocyst assessment. Datasets should feature diverse patient demographics, balanced negative and positive sampling, uniform image capture timing, uniformity in endpoint, and removal of confounding factors that can directly impact the endpoint. Additionally, consistent embryo transfer strategies and standardised annotations of patient characteristics are essential to achieve reliable endpoints.

Restricted dataset accessibility poses challenges for future researchers and the time taken for model development. Our review highlights the urgent need for a standardised dataset of blastocyst images used for AI model training and testing. The availability of a gold standard dataset, such as the annotated human blastocyst dataset, would establish a common benchmark for researchers, facilitating accurate comparison of AI models and promoting standardised practices. Ultimately, this initiative would enhance collaboration, optimise model performance, and contribute to the evolution of AI-driven blastocyst evaluation.

Acknowledgement

I want to express my sincere gratitude to everyone who played a crucial role in the successful completion of this research. A special appreciation goes to my Ph.D. guide and mentors, whose unwavering guidance and invaluable insights have shaped this study.

I would like to acknowledge and thank the dedicated team at my clinic for their constant support and collaborative efforts, which have been pivotal to the execution of this research.

Author contributions:

DBP: Conceptualisation, search strategy design, primary data extraction, and manuscript drafting; HD: Data collection, literature screening, and support in data synthesis; SK: Quality assessment of included studies and critical revision of the manuscript; SKV: Interpretation of findings, guidance on methodological framework, and manuscript refinement. All authors reviewed and approved the final version of the manuscript.

Ethical approval:

Institutional Review Board approval is not required.

Declaration of patient consent:

Patient’s consent is not required as there are no patients in this study.

Financial support and sponsorship:

Nil.

Conflicts of interest:

There are no conflicts of interest.

Use of artificial intelligence (AI)-assisted technology for manuscript preparation:

The authors confirm that they have used artificial intelligence (AI)-assisted technology for assisting in the writing or editing of the manuscript or image creation. The AI tools utilised during the preparation of this systematic literature review are as follows: Open Knowledge Maps – For literature search, visualisation and text mining, Inciteful – For locating highly cited manuscripts, CiteFast – For automated citation generation, PaperPal – For structural verification and grammar check of the manuscript, Trinka – For checking the grammar of the manuscript, ChatGPT – For alphabetical ordering of references in the manuscript. After utilising the services of these tools, the author(s) thoroughly reviewed and edited the content as necessary, assuming full responsibility for the publication's content.

REFERENCES

  1. , . Medical Image Analysis Based on Deep Learning Approach. Multimed Tools Appl. 2021;80:24365-98.
    [Google Scholar]
  2. . Culture and Selection of Viable Blastocysts: A Feasible Proposition for Human IVF? Hum Reprod Update. 1997;3:367-82.
    [Google Scholar]
  3. , . Assessment of Embryo Viability: The Ability to Select a Single Embryo for Transfer—A Review. Placenta. 2003;24:S5-S12.
    [Google Scholar]
  4. . IVF/ICSI Twin Pregnancies: Risks and Prevention. Hum Reprod Update. 2005;11:575-93.
    [Google Scholar]
  5. , , , , , , . The Istanbul Consensus Workshop on Embryo Assessment: Proceedings of an Expert Meeting. Hum Reprod. 2011;26:1270-83.
    [Google Scholar]
  6. , , . Randomized Comparison of Two Different Blastocyst Grading Systems. Fertil Steril. 2006;85:559-63.
    [Google Scholar]
  7. , , , , . The Clinical Use of Time-Lapse in Human-Assisted Reproduction. Ther Adv Reprod Health. 2020;14:263349412097692.
    [Google Scholar]
  8. , , , , , , . Characterization of a Top Quality Embryo, a Step Towards Single-Embryo Transfer. Hum Reprod. 1999;14:2345-9.
    [Google Scholar]
  9. , , , , , , . A Survey on Deep Learning in Medical Image Analysis. Med Image Anal. 2017;42:60-88.
    [Google Scholar]
  10. , , , , , , . Recent Advances and Clinical Applications of Deep Learning in Medical Image Analysis. Med Image Anal. 2022;79:102444.
    [Google Scholar]
  11. , , , , , , . Reporting on the Value of Artificial Intelligence in Predicting the Optimal Embryo for Transfer: A Systematic Review including Data Synthesis. Biomedicines. 2022;10:697.
    [Google Scholar]
  12. , , . Image Processing Approach for Grading IVF Blastocyst: A State-of-the-Art Review and Future Perspective of Deep Learning-Based Models. Appl Sci. 2023;13:1195.
    [Google Scholar]
  13. , , , , , , . Embryo Selection Through Artificial Intelligence Versus Embryologists: A Systematic Review. Hum Reprod Open. 2023;2023
    [Google Scholar]
  14. , , , , . Data Set Quality in Machine Learning: Consistency Measure Based on Group Decision Making. Appl Soft Comput. 2021;106:107366.
    [Google Scholar]
  15. , , , , , , . Characterization of an Artificial Intelligence Model for Ranking Static Images of Blastocyst Stage Embryos. Fertil Steril. 2022;117:528-35.
    [Google Scholar]
  16. , , , , , , . The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. Syst Rev. 2021;10:89.
    [Google Scholar]
  17. , , , . Feasibility of Artificial Intelligence for Predicting Live Birth Without Aneuploidy from a Blastocyst Image. Reprod Med Biol. 2019;18:204-11.
    [Google Scholar]
  18. , , , , , . Using Deep Learning with Large Dataset of Microscope Images to Develop an Automated Embryo Grading System. Fertil Reprod. 2019;1:51-6.
    [Google Scholar]
  19. , , , , , , . A Non-Invasive Artificial Intelligence Approach for the Prediction of Human Blastocyst Ploidy: A Retrospective Model Development and Validation Study. Lancet Digit Health. 2023;5:e28-e40.
    [Google Scholar]
  20. , , , , , , . Deep Learning Enables Robust Assessment and Selection of Human Blastocysts After In Vitro Fertilization. NPJ Digit Med. 2019;2:21.
    [Google Scholar]
  21. , , , , . A Deep Learning Framework Design for Automatic Blastocyst Evaluation With Multifocal Images. IEEE Access. 2021;9:18927-34.
    [Google Scholar]
  22. , , , , , , . Consistency and Objectivity of Automated Embryo Assessments Using Deep Neural Networks. Fertil Steril. 2020;113:781-7.e1.
    [Google Scholar]
  23. , , , , , , . Performance of a Deep Learning Based Neural Network in the Selection of Human Blastocysts for Implantation. Elife. 2020;15:9:e55301.
    [Google Scholar]
  24. , , , . Predicting a Live Birth by Artificial Intelligence Incorporating Both the Blastocyst Image and Conventional Embryo Evaluation Parameters. Artif Intell Med Imaging. 2020;1:94-107.
    [Google Scholar]
  25. , , , , , , . Development and Evaluation of Inexpensive Automated Deep Learning-Based Imaging Systems for Embryology. Lab Chip. 2019;19:4139-45.
    [Google Scholar]
  26. , , , . Deep Learning as a Predictive Tool for Fetal Heart Pregnancy Following Time-Lapse Incubation and Blastocyst Transfer. Hum Reprod. 2019;34:1011-8.
    [Google Scholar]
  27. , , , , , , . Development of an Artificial Intelligence Model for Predicting the Likelihood of Human Embryo Euploidy Based on Blastocyst Images from Multiple Imaging Systems During IVF. Hum Reprod. 2022a;37:1746-59.
    [Google Scholar]
  28. , , , , , , . A Machine Learning System with Reinforcement Capacity for Predicting the Fate of an ART Embryo. Syst Biol Reprod Med. 2021;67:64-78.
    [Google Scholar]
  29. , , , , . Embryo Ranking Intelligent Classification Algorithm (ERICA): Artificial Intelligence Clinical Assistant Predicting Embryo Ploidy and Implantation. Reprod Biomed Online. 2020;41:585-93.
    [Google Scholar]
  30. , , , . An Artificial Intelligence Model (Euploid Prediction Algorithm) Can Predict Embryo Ploidy Status Based on Time-Lapse Data. Reprod Biol Endocrinol. 2021;19:185.
    [Google Scholar]
  31. , , , , , , . An Artificial Intelligence Model Correlated with Morphological and Genetic Features of Blastocyst Quality Improves Ranking of Viable Embryos. Reprod BioMed Online. 2022b;45:1105-17.
    [Google Scholar]
  32. , , , , , , . Towards Automation in IVF: Pre-Clinical Validation of a Deep Learning-Based Embryo Grading System During PGT-A Cycles. J Clin Med. 2023;12:1806.
    [Google Scholar]
  33. , , , , , . Enhancing Clinical Utility: Deep Learning-Based Embryo Scoring Model for Non-Invasive Aneuploidy Prediction. Reprod Biol Endocrinol. 2024;22:58.
    [Google Scholar]
  34. , , , , , , . Combined Input Deep Learning Pipeline for Embryo Selection for In Vitro Fertilization Using Light Microscopic Images and Additional Features. J Imaging. 2025;11:13.
    [Google Scholar]
  35. , , , , , , . An Artificial Intelligence Algorithm for Automated Blastocyst Morphometric Parameters Demonstrates a Positive Association with Implantation Potential. Sci Rep. 2023;13:12345.
    [Google Scholar]
  36. , , , , , , . An Artificial Intelligence-Based Algorithm for Predicting Pregnancy Success Using Static Images Captured by Optical Light Microscopy During Intracytoplasmic Sperm Injection. J Hum Reprod Sci. 2021;14:288-92.
    [Google Scholar]
  37. , , , , . Robust and Generalizable Embryo Selection Based on Artificial Intelligence and Time-Lapse Image Sequences. PLoS One. 2022;17:e0262661.
    [Google Scholar]
  38. , , , , , , . A Novel System Based on Artificial Intelligence for Predicting Blastocyst Viability and Visualizing the Explanation. Reprod Med Biol. 2022;21:e12443.
    [Google Scholar]
  39. , , , , , , . Development and Evaluation of a Live Birth Prediction Model for Evaluating Human Blastocysts from a Retrospective Study. Elife. 2023;12:e83662.
    [Google Scholar]
  40. , , , , . Development of an Artificial Intelligence Based Model for Predicting the Euploidy of Blastocysts in PGT-A Treatments. Sci Rep. 2023;13:2322.
    [Google Scholar]
  41. , , , , , , . Improved Prediction of Clinical Pregnancy Using Artificial Intelligence with Enhanced Inner Cell Mass and Trophectoderm Images. Sci Rep. 2024;14:3240.
    [Google Scholar]
  42. , , . The Quiet Embryo Hypothesis: 20 years on. Front Physiol. 2022;13:899485.
    [Google Scholar]
  43. , , , , , , . Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain. Appl Sci. 2021;11:796.
    [Google Scholar]
  44. , , , , , , . Impact of Quality, Type and Volume of Data Used by Deep Learning Models in the Analysis of Medical Images. Inform Med Unlock. 2022;29:100911.
    [Google Scholar]
  45. , , . Evaluation of a Decided Sample Size in Machine Learning Applications. BMC Bioinform. 2023;24:48.
    [Google Scholar]
  46. , , , , . Blastocyst Formation Rate for Asians Versus Caucasians and Within Body Mass Index Categories. J Assist Reprod Genet. 2020;37:933-43.
    [Google Scholar]
  47. , . Ethnicity and Assisted Reproductive Technologies. Clin Pract. 2012;9:651-8.
    [Google Scholar]
  48. , , , , . A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis. Domain Adapt Represent Transf Afford Healthc AI Resour Divers Glob Health. 2021;12968:3-13.
    [Google Scholar]
  49. , . Domain Adaptation for Medical Image Analysis: A Survey. IEEE Trans Biomed Eng. 2021;69:1173-85.
    [Google Scholar]
  50. , , , , , , . Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations Without Sharing Patient Data. Sci Rep. 2020;10:12598.
    [Google Scholar]
  51. , , , , , , . P-306 Generalizable AI Model for Microscopic and Timelapse Multifocal Embryo Images. Hum Reprod. 2023;38:dead093.664.
    [Google Scholar]
  52. , , , , , , . P-171 Sensitivity Analysis of an Embryo Grading Artificial Intelligence Model to Different Focal Planes. Hum Reprod. 2022;37:deac107.166.
    [Google Scholar]
  53. , . Assessment of Human Embryo Development Using Morphological Criteria in an Era of Time-Lapse, Algorithms and “OMICS”: Is Looking Good Still Important? Mol Hum Reprod. 2016;22:704-18.
    [Google Scholar]
  54. , , . Human Embryo Culture Media Comparisons. Embryo Cult. 2012;911:367-86.
    [Google Scholar]
  55. , , , , , . Cumulative Live Birth Rates and Number of Oocytes Retrieved in Women of Advanced Age. A Single Centre Analysis Including 4500 Women ≥38 Years Old. Hum Reprod. 2018;33:2010-7.
    [Google Scholar]
  56. , , , , , . Effect of Body Mass Index on IVF Treatment Outcome: An Updated Systematic Review and Meta-Analysis. Reprod BioMed Online. 2011;23:421-39.
    [Google Scholar]
  57. , , , . Effect of Endometriosis on IVF/ICSI Outcome: Stage III/IV Endometriosis Worsens Cumulative Pregnancy and Live-Born Rates. Hum Reprod. 2005;20:3130-5.
    [Google Scholar]
  58. , , , , , , . Automated Detection of Poor-Quality Data: Case Studies in Healthcare. ProQuest. 2012;11:18005.
    [Google Scholar]
  59. . Hum Reprod Open. . 2017;2017:hox01.
    [Google Scholar]
  60. , , , . Imbalanced Class Distribution and Performance Evaluation Metrics: A Systematic Review of Prediction Accuracy for Determining Model Performance in Healthcare Systems. PLOS Digit Health. 2023;2:e0000290.
    [Google Scholar]
  61. , , , , . Effect of Maternal Age on the Outcomes of In Vitro Fertilization and Embryo Transfer (IVF-ET). Sci China Life Sci. 2012;55:694-8.
    [Google Scholar]
  62. , , , , , . Using Deep Learning to Predict the Outcome of Live Birth from More Than 10,000 Embryo Data. BMC Pregnancy and Childbirth. 2022;22
    [Google Scholar]
Show Sections