Translate this page into:
Dissecting Datasets Used by Artificial Intelligence or Deep Learning Models for Morphological Assessment of Blastocysts: A Systematic Review

*Corresponding author: Doel Bose Pande, Department of Reproductive Medicine, Indore Infertility Clinic, Indore, India. doelpande@gmail.com
-
Received: ,
Accepted: ,
How to cite this article: Pande DB, Deshmukh H, Kandari S, Verma SK. Dissecting Datasets Used by Artificial Intelligence or Deep Learning Models for Morphological Assessment of Blastocysts: A Systematic Review. Fertil Sci Res. 2025;12:30. doi: 10.25259/FSR_28_2025
Abstract
ABSTRACT
This systematic review scrutinised the datasets used to train Artificial Intelligence (AI) tools developed for automated blastocyst assessment. It is widely acknowledged that the quality of these datasets significantly influences the performance of the AI models. Analysis of datasets from 26 studies highlighted considerable variations in various dataset parameters such as dataset size, data diversity, image quality, image capture mechanism and timing, class distribution, dataset endpoints and metadata usage. Some models incorporate morphokinetic or morphometric annotations and clinical metadata, whereas others rely solely on single-point static images. Many studies lack crucial information, such as image capture timing, embryo transfer strategy, and vital information related to the removal of confounding factors, such as uterine factors, hindering cross-study comparisons. Standardisation of datasets is vital for accurate assessment and comparison of commercially available AI models used for blastocyst assessment. The absence of standardised parameters and the lack of removal of confounding variables emphasise the need for greater transparency and standardisation in dataset creation and reporting. Future research should prioritise constructing a robust gold-standard large dataset that includes diverse imaging data and excludes confounding factors. In the absence of such a dataset, comparison of the AI model becomes very subjective.
Keywords
Artificial intelligence
Blastocyst assessment
Computer vision in medical imaging
Deep learning
Embryo assessment
Machine learning
INTRODUCTION
In recent years, the utilisation of Artificial Intelligence (AI) and Computer Vision in medical image processing has witnessed notable advancements. Studies such as Puttagunta M.[1] have highlighted the substantial promise of AI in automating complex image analysis, making it a compelling choice for applications that rely heavily on visual data, such as radiology.
In vitro fertilisation (IVF) and the introduction of extended culture media have facilitated the extensive adoption of blastocyst culture.[2] Blastocyst culture enables the selection of the most robust embryo from the cohort and makes single-blastocyst transfers possible.[3] The main benefit of elective single-embryo transfer is to avoid transferring more than one embryo that could result in multiple gestations, which have increased antenatal complications and risks.[4] Central to this paradigm shift is the essential process of blastocyst assessment and selection using visual morphological data of an embryo, which has a substantial influence on the effectiveness and outcomes of IVF procedures.
Currently, this assessment relies predominantly on manual visual evaluation conducted by experienced embryologists. An embryologist usually performs this assessment at a fixed time post-insemination, also known as hours post-insemination or hpi.[5] An experienced embryologist observes various morphological features of blastocysts within the cohort to compare and determine the grade of the blastocysts. The most common scoring system for grading blastocysts is the Gardner scoring system.[6] Most clinics have an image capture system attached to an optical microscope, which allows embryologists to capture images of blastocysts for subsequent assessment. Cost considerations pose a significant barrier to the widespread adoption of time-lapse incubators for continuous embryo monitoring in clinical settings. Only a limited number of facilities worldwide have incorporated this technology. Using a time-lapse system, an embryologist can visualise individual embryos in culture at different time points without removing them from the incubator.[7]
At the core of the process of embryo selection is the concept of understanding the morphology of a blastocyst and comparing the morphology with others in the cohort to grade and rank the blastocyst.[8] This enables embryologists to choose the best embryo for transfer or to freeze it for later use. Embryologists who perform this task regularly learn and gather experience in embryo grading based on the number of embryos they grade and the known outcome or endpoint of the embryo transfer. This cyclic process enables embryologists to perform better as they gain more experience. Most embryologists worldwide use the Gardner scoring system to score three blastocyst parameters: expansion grade, inner cell mass (ICM) quality, and trophectoderm (TE) quality.[3]
In recent years, significant progress has been made in leveraging machine learning (ML) algorithms and computer vision to automate tasks that were traditionally performed manually. This advancement extends to the domain of medical imaging, where the acquisition and analysis of image data may play a crucial role in clinical decision-making. AI models trained using either supervised or unsupervised learning methods have been used to enhance these processes.[9]
Given that these models operate at the raw pixel level, over time, they possess the capability to learn a spectrum of features, ranging from macroscopic attributes such as embryo/cell shape and size to finer details such as texture and pattern, all derived from the training data.[10]
Various AI models have been tested in the last few years for blastocyst assessment with varying degrees of accuracy.[11]AI models have the potential to learn from specific image features by analysing thousands of images used to train them with a very high processing speed and assist an embryologist by either ranking the embryos or associating a predictive score with the embryo to act as a decision support tool.[12] A systematic review by Salih M[13] indicated that AI models demonstrated a median accuracy of 77.8% in predicting clinical pregnancy using patient clinical treatment information, in contrast to 64% when performed by embryologists alone. While integrating both images/time-lapse and clinical information, the AI models achieved a higher median accuracy of 81.5% than clinical embryologists, with a median accuracy of 51%.
The effectiveness of AI models used for blastocyst assessment depends substantially on the datasets employed for training the AI model.[14] The capacity to assess the comparative performance of various commercially available AI models depends on their adherence to comparable standards in the input data used to train AI models. This includes considerations such as the quality of images fed to AI systems, disparity in image capture tools, similarity in the timing of image capture, diversity in image resolution, uniformity in using image annotations and clinical metadata, and meticulous collection of similar endpoints in terms of either foetal heartbeat or live birth. The datasets used to train AI models, especially those trained to predict implantation or live birth, should ideally exclude confounders such as maternal age, type of embryo transfer strategy (fresh transfer or frozen transfer), uterine factors, and severe male factors. The dataset is split in terms of negative and positive results, further moulding the influence of the dataset on the performance of AI models in blastocyst morphological assessment.[15]
This systematic review highlights the need for standardised datasets, as variations in key characteristics hinder meaningful comparisons across AI models. Therefore, this systematic review seeks to address the following research questions: What are the critical differences in dataset characteristics used to train AI models for automated blastocyst assessment, and how could creating a universal dataset standard enable reliable comparisons of the accuracy of various AI models?
MATERIAL AND METHODS
The systematic review methodology was in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.[16] This research question was formulated based on the objectives of this review.
Search Strategy
A comprehensive literature search was conducted between 28th June 2023 and 5th July 2023 in PubMed and Elsevier Library using the keywords ‘Deep Learning’ OR ‘Machine Learning’ OR ‘Artificial Intelligence’ OR ‘Computer Vision’ AND ‘Blastocyst’ OR ‘Embryo.’ An additional search was performed on January 15, 2025, to update the existing literature.
Study Selection
The initial search retrieved 384 studies. The inclusion criteria for the review were articles published in the last 10 years that focused on the most relevant studies, resulting in 362 studies. After excluding 32 duplicates, the remaining studies were screened to include only studies published in English and those based on human blastocysts, leaving 186 studies to be screened.
Four other eligible studies were identified through thorough forward and backwards citation mining of the selected relevant literature by employing the Open Knowledge Maps Tool on 29th August 2024.
Excluded Studies
Articles related to the use of AI models in the assessment of cleavage-stage embryos (day 3 embryos) and oocyte morphology were excluded. Additionally, generic narratives on the use of AI in IVF labs and gamete assessments were excluded. Studies that used AI for the association of clinical parameters and live births with proteomic, transcriptomic, and radiomic models were also excluded. Furthermore, we excluded studies that solely aimed to assess the efficacy of existing AI models, such as the Intelligent Data Analysis (iDA) and Known Implantation Data (KID) scores, on specific datasets annotated by the authors. These studies did not involve training the AI models but rather focused on applying pre-trained models to datasets with clinical or morphological annotations.
Primary selection was performed based on title and abstract screening using the PRISMA guidelines, which resulted in 26 eligible studies.
Data Extraction
Data extraction was independently performed by two reviewers Doel Bose and Sayali Kandari (DB and SK). A consensus was reached through discussion, and a third reviewer Sandeep Kumar Verma (SKV) resolved any persistent disagreements.
A standardised spreadsheet was used for meticulous data extraction, facilitating the systematic collection of essential information from each included study. The key parameters were as follows:
Dataset size: Total number of samples/images in each dataset.
Span of data collection: Span of data collection in years.
Dataset source clinics: Clinic(s) or providers of the dataset.
Image capture technique: Static image or time-lapse video.
Image resolution: Pixel dimensions or image quality within the captured dataset.
Image/video capture device: Camera/device vendor and type.
Metadata with image: Maternal age and clinical parameters used to train the AI model.
Dataset outcome or ground truth or endpoint: Was the outcome measured in terms of live birth or cardiac activity? If cardiac activity was measured, when was it measured, 6 weeks | 8 weeks?
Day of image capture: blastocyst image capture day—D5 or D6, or D7
Dataset capture timing: When was the image captured in terms of hpi.
Number of focal planes used for image/video capture: The number of focal planes utilised in capturing images or videos of embryos intended for AI model training.
Dataset class distribution: The distribution of images across different classes or categories, that is, the percentage of images with a negative outcome and the percentage of images with a positive outcome.
Type of embryo transfer: Whether images are related to fresh embryo transfer, frozen embryo transfer (FET), or a combination of both strategies.
Public availability of data: Whether the datasets were publicly available for use by other researchers.
Quality Assessment
Two authors independently reviewed the abstracts and excluded those that did not meet the inclusion criteria. The selected papers were assigned to both reviewers [Doel Bose (DB) and Sayali Kandari (SK)], with conflicts resolved by a third author Hemant Deshmukh (HD) through majority voting. Data extraction and quality assessment were conducted independently by two authors, and disagreements were resolved by a third author. Since the majority of studies in our review did not strictly qualify as medical interventions, a more tailored approach was chosen to assess the risk of bias in datasets using predefined criteria for comparison using spreadsheets.
RISK OF BIAS ASSESSMENT
Two independent reviewers (DB and SK) rigorously evaluated the risk of bias in key domains, including the conflicting interests of authors, study design, participant selection, confounding variables, and outcome measurements. Discrepancies were resolved through arbitration and by reviewer Sandeep Kumar Verma (SKV) when needed.
OUTCOME
During the screening phase, both reviewers meticulously examined the abstracts, resulting in the exclusion of 137 studies deemed irrelevant to the research questions. Further scrutiny of the remaining pool led to the assessment of 49 eligible studies. Following a thorough evaluation of the literature abstracts and titles, 22 studies that precisely met the research question criteria were identified.
In addition to the systematic screening process, advanced tools like Research Rabbit and Connected Papers were utilised. These tools, employed to visualise the network of papers, proved instrumental in identifying any potentially relevant studies that might have been overlooked through conventional search methods.
Furthermore, through backwards and forward citation mining using the above tools, four more studies were added, resulting in a total of 26 studies included in this systematic review. The distribution of results at each stage is presented graphically in Figure 1.

- PRISMA flow diagram of study selection. Systematic review of use of artificial intelligence for blastocyst assessment. The literature search included studies published since 2013 written in the English language and the ones done on human blastocysts.
MAIN RESULT
In our review of 26 studies on blastocyst assessment using AI, the dataset sizes varied widely from 160 images to 171,239 images.[17,18] The dataset source showed notable variations, with fifteen studies relying on a single clinic’s data, thus potentially limiting adaptability.
The timing of blastocyst image capture varied among studies. Notably, seven studies did not specify the timing of image capture on day 5; seven studies had a large window for capturing images of blastocysts from D5 to D7 (104–130 hours). In contrast, only seven studies captured blastocyst images at a recommended fixed window between 110 and 116 hours post-insemination to maintain the consistency of input data.[19–25] This highlights the need for standardisation and consistency of accurate and comparable blastocyst assessment results.
The distribution of dataset classes and sampling strategies exhibited considerable variation, ranging from the common 40%–50% distribution of good embryos and 50%–60% distribution of poor embryos to an extreme case featuring only 8% good and 92% poor embryos.[26]
In terms of imaging techniques, thirteen studies employed time-lapse only, while eight utilised only optical light microscopy. Most studies that are based on static images have utilised either optical microscope images or static frames from time-lapse videos as a source of images. Two exceptions stand out: Diakiw, S. M[27], used both types of static image sources, while Loewke, K. [15] uniquely included low-resolution static images from a stereo zoom microscope as a third type of image source.
In terms of focal planes, 17 studies employed static images with a single focal plane for the AI model training. Only one study utilised 11 focal planes per image for the evaluation of blastocysts, whereas another study utilised seven focal planes per image to train the AI model, showing diverse approaches in terms of input data.[21,28]
None of the studies explicitly indicated the elimination of confounding factors by excluding females with uterine abnormalities or issues related to uterine receptivity from the datasets used to train AI models. One study based on predicting the euploidy of blastocysts by AI eliminated the confounding uterine factors by only considering live birth and aneuploid miscarriages verified by CVS (Chorionic Villus Sampling).[17]
Seven studies incorporated data from both fresh and FETs, whereas five studies did not specify the type of embryo transfer strategy (fresh or frozen). Of the remaining studies, eight relied exclusively on ground truth data from FET cycles.
The datasets exhibit a variety of ground-truth measurements. Only one dataset employs serum beta-human chorionic gonadotropin (βHCG) levels on day 7 post-embryo transfer as an indicator of early pregnancy, while a number of studies, totalling six, concentrate on foetal heartbeat scans between 6 and 10 weeks.[29] The endpoint of the live birth data was collected from the nine datasets. Additionally, five studies utilised Preimplantation Genetic Testing for Aneuploidy (PGT-A) results, providing a genetic perspective on embryo viability.[19,27,30–33]
SYNTHESIS OF RESULTs
Variability in Dataset Size
In our comprehensive review of 26 studies, each study was thoroughly scrutinised to ascertain the actual number of images used to train the AI model, as shown in Table 1.[15,17,18-35] Many papers presented an inflated number in the dataset size, but later in the ‘Material and Methods’ section, many studies mentioned the exclusion of images, as they were of poor quality, or mentioned data augmentation by flipping and rotations. Thus, only filtered images that were fed to the AI model during training or validation without data augmentation were considered for the dataset size.
| Sr. no | Authors | Dataset size | Span of data collection | Number of clinics involved |
|---|---|---|---|---|
| 1 | Loewke et al.[15] | 5923 | 2015–2020 | 11 |
| 2 | Miyagi et al.[17] | 160 | 2008–2017 | 1 |
| 3 | Chen et al.[18] | 1,71,239 | 2014–2018 | 1 |
| 4 | Barnes et al.[19] | 10,378 | 2012–2017 | 2 |
| 5 | Khosravi et al.[20] | 1764 | 2012–2017 | 1 |
| 6 | Wang et al.[21] | 1025 | 2017–2018 | 1 |
| 7 | Bormann et al. [22] | 742 | Not mentioned | 1 |
| 8 | Bormann et al.[23] | 3469 | Not mentioned | 1 |
| 9 | Miyagi et al.[24] | 5691 | 2009–2017 | 1 |
| 10 | Kanakasabapathy et al.[25] | 542 | Not mentioned | 2 |
| 11 | Tran et al.[26] | 10,638 | 2014–2018 | 8 |
| 12 | Diakiw et al.[27] | 1001 | 2011–2020 | 10 |
| 13 | Giscard d’Estaing et al.[28] | 854 | 2013–2017 | 1 |
| 14 | Chavez-Badiola et al.[29] | 946 | 2015–2019 | 3 |
| 15 | Huang et al.[30] | 1803 | 2018–2019 | 1 |
| 16 | Diakiw et al.[31] | 5010 | 2011–2020 | 10 |
| 17 | Cimadomo et al.[32] | 3604 | 2013–2022 | 1 |
| 18 | Ma et al.[33] | 3405 | 2018–2021 | 2 |
| 19 | Onthuam et al.[34] | 1194 | 2018–2022 | 1 |
| 20 | Goldmeier et al.[35] | 608 | 2014–2017 | 3 |
| 21 | Geller et al.[36] | 361 | 2016–2019 | 4 |
| 22 | Berntsen et al.[37] | 1,15,832 | 2011–2019 | 18 |
| 23 | Enatsu et al.[38] | 19,342 | 2011–2019 | 1 |
| 24 | Liu et al.[39] | 17,580 | 2016–2020 | 1 |
| 25 | Yuan et al.[40] | 1036 | 2019–2022 | 1 |
| 26 | Huang et al.[62] | 15,434 | 2018–2019 | 1 |
A wide range of dataset sizes was observed, ranging from a minimum of 160 images to a maximum of 171,239 images, indicating the diversity of the dataset size, as depicted in Figure 2.[17,18] It is important to note that two studies utilised a dataset size of less than 500 images, whereas two others utilised a dataset size of more than 100,000 images.[17,18,36,37] It is intuitively understood that a more extensive dataset contributes to the enhanced training and performance of AI models in blastocyst assessment.

- Dataset size used in studies. Two studies used less than 500 images to train the AI model. Nine studies utilised as many as 1000 images to train the AI model, while 2 studies used greater than 1,00,000 images for training the AI model.
Dataset Demographics and Source
Most studies (15 out of 26) have focused their AI models on data from a single clinic, potentially limiting adaptability, as shown in Table 1.[17,18,20–24,28,30,32,37–39] Notable exceptions include a study using a diverse multinational dataset (USA, India, Spain and Malaysia) and another incorporating data from 18 clinics across five countries (Denmark, UK, Australia, Japan and Ireland), showing broader geographical coverage as shown in Figure 3.[31,37]

- Number of clinics involved in data collection. Of the 26 studies included in the systematic review, 15 collected data from a single clinic. Only three studies collected diverse data that spanned more than one country.
Variability in Image Capture Mechanism and Resolution
A diverse range of image resolutions (100 pixels × 100 pixels to 1024 pixels × 768 pixels) reflects various equipment choices as shown in Table 2. Notably, 13 studies employed only time-lapse devices, while eight solely utilised optical light microscopes, illustrating technological diversity.[18,25,29,31,36,38,39] The number of focal planes in the time-lapse video ranged from 7 to 11, introducing complexity to the data collection method as shown in Figure 4.
| Sr.No | Authors | Resolution of image | Type of input to AI model | Focal plane(s) | Types of camera used |
|---|---|---|---|---|---|
| 1 | Loewke et al.[15] | 224 pixels × 224 pixels, 112 pixels × 112 pixels, 56 pixels × 56 pixels | Static image, static images from TL, static images from Stereozoom | 1 | 8 types - Olympus IX71, Nikon Diaphot 300, Olympus IX73, Olympus IX70, Nikon Diaphot, NIKON Eclipse TE300, Nikon SMZ 800 (Stereo Zoom), Embryoscope |
| 2 | Miyagi et al.[17] | 100 pixels × 100 pixels | Static image | 1 | Make not mentioned |
| 3 | Chen et al.[18] | 264 pixels × 198 pixels | Static image | 1 | 1 type - Optical Light Microscope (Zeiss Axio Observer Z1) |
| 4 | Barnes et al.[19] | 500 pixels × 500 pixels | Static image | 1 | 2 types - EmbryoScope, Embryoscope plus |
| 5 | Khosravi et al.[20] | 500 pixels × 500 pixels | Static images from TL | 7 | 1 type -Embryoscope TL |
| 6 | Wang et al.[21] | 1280 pixels × 960 pixels | Static images from TL | 11 | 1 type - ASTEC Time Lapse - CCM-iBIS |
| 7 | Bormann et al. [22] | Not mentioned | Static image | 1 | 1 type - Embryoscope TL |
| 8 | Bormann et al.[23] | 210 pixels × 210 pixels | Static image | 1 | 1 type - Embryoscope TL |
| 9 | Miyagi et al.[24] | 50 pixels × 50 pixels | Static image | 1 | Optical light microscope. Make not mentioned. |
| 10 | Kanakasabapathy et al.[25] | Not mentioned | Static image | 1 | 1 type - Proprietary Optical System mounted on Smart Phone |
| 11 | Tran et al.[26] | whole video | TL video | Multiple | 2 types - Embryoscope, Embryoscope plus |
| 12 | Diakiw et al.[27] | 480 pixels × 480 pixels | Static image, static images from TL | 1 | 6 types - Cooper Surgical Saturn5, Saturn3, Vitrolife Octax, Hamilton Thorne LYKOS, EmbryoScope TL,Merck GERI TL |
| 13 | Giscard d’Estaing et al.[28] | 1000 pixels × 1000 pixels | Static images from TL | 7 | 1 type - Embryoscope TL |
| 14 | Chavez-Badiola et al.[29] | 640 pixels × 480 pixels or 807 pixels × 603 pixels | Static image | 1 | 2 types - Olympus IX71, Olympus IX73 |
| 15 | Huang et al.[30] | Not specified (video) | TL video | Multiple | 1 type - Embryoscope Plus TL |
| 16 | Diakiw et al.[31] | 480 pixels × 480 pixels | Static image | 1 | Optical light microscope. Make not mentioned. |
| 17 | Cimadomo et al.[32] |
not available (NA) | TL video | Multiple | 1 type - Embryoscope TL |
| 18 | Ma et al.[33] | Not specified (video) | TL video | Multiple | 1 type - Embryoscope Plus TL |
| 19 | Onthuam et al.[34] | 224 pixels × 224 pixels | Static image | 1 | Optical light microscope. Make not mentioned. |
| 20 | Goldmeier et al.[35] |
224 pixels × 224 pixels | TL video | Multiple | 1 type - Embryoscope TL |
| 21 | Geller et al.[36] | 224 pixels × 224 pixels | static image | 1 | Optical light microscope. Make not mentioned. |
| 22 | Berntsen et al.[37] | 256 pixels × 256 pixels | Static image | 1 | 2 types - embryoscope, embryoscope Plus |
| 23 | Enatsu et al.[38] | 480 pixels × 640 pixels | Static image | 1 | Optical light microscope. Make not mentioned. |
| 24 | Liu et al.[39] | 1024 pixels × 768 pixels | Static images | 2 | Optical light microscope. Make not mentioned. |
| 25 | Yuan et al.[40] | Not mentioned | TL video | Multiple | 1 type - Embryoscope Plus TL |
| 26 | Huang et al.[62] | 224 pixels × 224 pixels | Static images from TL | 1 | 1 type - Embryoscope Plus |
CCM-iBIS: Cell Culturing Monitoring - Integrated Ballistics Incubation System.

- Focal planes used to capture images in datasets. Out of 26 studies, 17 collected images using a single focal plane, i.e., static images, while a few studies relied on capturing images of the same blastocyst at varying focal planes ranging from 2 to 11 focal planes.
In the majority of studies (12 of 26), a single data capture mechanism was used exclusively. Two studies did not specify their image capture technique.[17,24] Only two studies utilised more than one image-capture mechanism. A study by Diakiw, S. M.,[27] predominantly used static images from optical light microscopes for training and validation, reserving static frames from time-lapse videos specifically for testing only, as shown in Figure 5. Loewke, K.[15] incorporated low-resolution images (56 pixels × 56 pixels) from a stereo zoom microscope as part of their training data, while also using static frames from time-lapse videos exclusively to test their model.

- Dataset image capture source. Out of 26 studies included in the systematic review, 14 collected images from the time-lapse system, while eight studies relied solely on static images collected from an optical light microscope.
Dataset Image Capture Camera Model
A notable observation made while evaluating the camera model of the dataset across studies is shown in Figure 6. Among the 26 studies, more than half (17 out of 26) used a single type of device to capture images, either a single type of optical light microscope or a single type of time-lapse incubator fitted with a single type of camera.[18–25,28,30,32,33,35,36,39,40] Exceptions to these findings are two studies that included six and eight camera models, respectively, for collecting input datasets, as shown in Figure 6.[15,27]

- Types of image capture mechanisms used in the collection of image datasets. Thirteen studies used images from a single type of optical system while seven did not specifically mention the make or model of the optical system and camera. Two studies utilized six and eight types of image capture mechanisms, as shown.
Dataset Image Capture Timing
The timing of blastocyst image capture presents a significant discrepancy among studies. Thirteen studies lacked specific mention of the image or video capture timing on day 5 or day 6.[15,26–28,31,33,36,38–40,41] Among those providing the exact timing, seven studies utilise images captured at the exact time range between 110 and 115 hpi, as shown in Table 3.[17,19,20–23,25] Additionally, six studies incorporated a wider range of timing, capturing images anywhere between 104 and 140 hours, i.e., day 5 to day 6.[18,24,28,30,32,33,37]
| Sr. no | Authors | Timing of image capture | Day of image capture |
|---|---|---|---|
| 1 | Loewke et al.[15] | Not mentioned | D5, D6, or D7 |
| 2 | Miyagi et al.[17] | 115 hpi | D5 |
| 3 | Chen et al.[18] | 112–116 hpi (D5), 136–140 hpi (D6) | D5, D6 |
| 4 | Barnes et al.[19] | 110 hpi | D5 |
| 5 | Khosravi et al.[20] | 110 hpi | D5 |
| 6 | Wang et al.[21] | 116 ± 1 hpi | D5 |
| 7 | Bormann et al. [22] | 113 hpi | D5 |
| 8 | Bormann et al.[23] | 113 ± 0.05 hpi | D5 |
| 9 | Miyagi et al.[24] | 115 or 139 hpi | D5, D6 |
| 10 | Kanakasabapathy et al.[25] | 113 hpi | D5 |
| 11 | Tran et al.[26] | Not mentioned | D5 |
| 12 | Diakiw et al.[27] | Not mentioned | D5, D6 |
| 13 | Giscard d’Estaing et al.[28] | Not mentioned | D5, D6 |
| 14 | Chavez-Badiola et al.[29] | 130–131.8 hpi | D6 |
| 15 | Huang et al.[30] | 110–116 hpi (D5), 132.5–136 hpi (D6) | D5, D6 |
| 16 | Diakiw et al.[31] | Not mentioned | D5 |
| 17 | Cimadomo et al.[32] | Less than 120 hpi = D5, Between 121 and 144 hpi = D6, Greater than 144 hpi = D7 |
D5, D6, D7 |
| 18 | Ma et al.[33] | Not mentioned | D5, D6 |
| 19 | Onthuam et al.[34] | Not mentioned | D5 |
| 20 | Goldmeier et al.[35] | Not mentioned | D5 |
| 21 | Geller et al.[36] | Not mentioned | D5 |
| 22 | Berntsen et al.[37] | 108–140 hpi | D5, D6 |
| 23 | Enatsu et al.[38] | Not mentioned | D5 |
| 24 | Liu et al.[39] | Not mentioned | D5 |
| 25 | Yuan et al.[40] | Not mentioned | D5, D6 |
| 26 | Huang et al.[62] | 105–125 hpi | D5, D6 |
Only 7 out of 26 studies used a fixed time frame on day 5 with respect to hpi to capture blastocyst images, as shown in Figure 7. One study utilised a variable timeframe for image capture, anywhere between days 5, 6, or 7, which raises questions about the quality of the dataset.[15]

- Day and timing of image capture of datasets. Only 7 out of 26 studies use a fixed time frame with respect to hours post insemination (hpi) to capture a blastocyst image.
Dataset Endpoints or Ground Truth
In the majority of studies included in this systematic review, the correlation between the dataset endpoint and the intended objective of blastocyst assessment was notable, as shown in Table 4. An exception worth noting was the study which focused on ranking embryos based on their implantation potential.[29] Here, the premature endpoint involved measuring βHCG at day 7 post-embryo transfer. It is worth noting that a more optimal approach might have been assessing βHCG levels at a later date or monitoring foetal heartbeat at 6–8 weeks for a more accurate evaluation. The bar chart, as shown in Figure 8, represents the endpoints, or ground truth, opted in various studies included within this systematic review.
| Sr. no | Authors | Class distribution (% negative: % positive) |
Primary aim of blastocyst assessment | Endpoint/ground truth |
|---|---|---|---|---|
| 1 | Loewke et al.[15] | Not clear | Ranking blastocyst based on morphology | FHB at 6–8 weeks |
| 2 | Miyagi et al.[17] | 50% : 50% | Predict probability for live birth ( distinguish between normal category and abortion category) | Live birth or aneuploid abortion verified by CVS sampling |
| 3 | Chen et al.[18] | NA | Automated grading of blastocyst | Blastocyst grading |
| 4 | Barnes et al.[19] | 57% : 43% | Prediction of human blastocyst ploidy | PGT-A result, FHB for non PGT-A embryo transfers |
| 5 | Khosravi et al.[20] | 50% : 50% | Assess blastocyst quality. Identify good-quality and poor-quality images | Good vs. bad embryo classification |
| 6 | Wang et al.[21] | 49% : 51% | Comparing blastocyst quality evaluation using multifocal images | Blastocyst classification |
| 7 | Bormann et al. [22] | Not clear | Scoring embryo to make disposition decisions for biopsy, freeze or discard | Embryo selection for biopsy, cryopreservation or discard |
| 8 | Bormann et al.[23] | 54% : 44% | Identify blastocyst capable of implantation | FHB |
| 9 | Miyagi et al.[24] | 72% : 28% | Evaluate likelihood of clinical pregnancy | Live birth |
| 10 | Kanakasabapathy et al.[25] | Not clear | Assessment of blastocyst morphology for classification of embryo | Blastocyst vs. non blastocyst classification |
| 11 | Tran et al.[26] | 92% : 8% | Foetal heartbeat prediction from Time Lapse video without morphokinetic annotation | FHB at 7 weeks |
| 12 | Diakiw et al.[27] | 41% : 59% | Predict human embryo ploidy status using static images | PGT-A result |
| 13 | Giscard d’Estaing et al.[28] | 28% : 72% | Live birth prediction from blastocyst scoring | Live birth |
| 14 | Chavez-Badiola et al.[29] | 56% : 44% | Predict ploidy (viability) and implantation using static blastocyst images | Serum βHCG on day 7 of ET |
| 15 | Huang et al.[30] | Not clear | Predict embryo ploidy status based on timelapse data | PGT-A result |
| 16 | Diakiw et al.[31] | 43% : 57% | Likelihood of clinical pregnancy | PGT-A result |
| 17 | Cimadomo et al.[32] | Not clear | Embryo grading and Euploidy prediction | Live birth |
| 18 | Ma et al.[33] | 43% : 44% : 12% | Embryo selection using AI model trained with metadata | PGT-A result |
| 19 | Onthuam et al.[34] | 31% : 69% | Predict embryo aneuploidy | FHB at 6–10 weeks |
| 20 | Goldmeier et al.[35] | 67% : 32% | Predict embryo implantation potential by automated calculation of morphometric parameters | FHB at 6–8 weeks |
| 21 | Geller et al.[36] | 42% : 58% | Predicting whether an embryo will lead to a pregnancy and predict outcome of that pregnancy | Live birth |
| 22 | Berntsen et al.[37] | 71% : 29% | Correlation between morphokinetics and AI assessment | FHB at 7 weeks |
| 23 | Enatsu et al.[38] | 61% : 39% | Comparison of live birth prediction using image only with image + clinical data ensemble AI model | Live birth |
| 24 | Liu et al.[39] | 64% : 36% | Live birth prediction model based on image and clinical data | Live birth |
| 25 | Yuan et al.[40] | 47% : 53% | Automated blastocyst quality assessment with multifocal images ( good vs poor) | Live birth |
| 26 | Huang et al.[62] | 54% : 46% | Predict the probability of live birth from timelapse data | Live birth |
PGT-A: Preimplantation genetic testing for aneuploidy, FHB: Foetal heartbeat, βHCG: Beta human chorionic gonadotropin, ET: Embryo transfer, CVS: Chorionic villus sampling, TL: Time lapse.

- Ground truth or end points of studies. Only one study used serum βHCG as end point while nine studies tracked patients till live birth. βHCG: Beta human chorionic gonadotropin, PGT-A: Preimplantation genetic testing for aneuploidy, FHB: Foetal heartbeat.
Another noteworthy study in the review centres on blastocyst assessment aimed at associating a probability score for live birth. In the study by Miyagi Y,[17] the endpoint was defined by measuring live births or tracking aneuploid abortions verified by CVS, providing a robust method as compared to relying solely on negative βHCG or missing cardiac activity.[17]
Dataset Cleansing – Removing Confounders
Most of the studies associated with blastocyst assessment aimed for either viability or predictive score. These studies did not consider filtering the datasets to remove confounding factors like uterine abnormalities or receptivity that could influence the clinical endpoint of foetal heartbeat at 6–8 weeks.
Only one study excluded confounders by only considering aneuploid miscarriages as the negative class, eliminating various confounding factors related to uterine abnormalities.[17]
Dataset Sampling Strategy and Class Imbalance
Considerable variation in dataset class distribution was observed among the included studies, as detailed in Table 4. A realistic distribution of close to 40% good embryos and 60% poor embryos was commonly employed in 9 out of 26 studies, as shown in Figure 9, while seven studies approached an even 50%–50% split.

- Dataset class distribution—percentage of negative class vs. positive class. Notably, five studies did not have a clear mention of the class distribution. Seven adopted a class distribution of 50%–50% between negative and positive classes, while another nine opted for a distribution closer to reality, with 60% negative and 40% positive classes.
However, one study employed an unusually imbalanced class distribution, opting for an 8% positive and 92% negative split.[26] This emphasises the lack of standardised common practices in defining the composition of training datasets and their sampling strategies.
Embryo Transfer Strategy: Fresh vs. Frozen Transfer
Including data from both fresh and FETs in seven studies introduces a confounder to the predictive model, as shown in Table 5.[15,22,26–29,37] It is widely acknowledged that fresh embryo transfers tend to have a lower pregnancy rate than FET cycles, primarily due to compromised uterine receptivity.[42] Studies focused on predicting embryo viability and euploidy logically opted for FET as their preferred embryo transfer strategy.[19,27,30,31] This choice aligns with the nature of these studies, which relied on predicting euploidy outcomes, where the endpoint was the PGT-A result.
| Sr. no | Authors | Embryo transfer strategy | Meta data inclusion in AI model training |
|---|---|---|---|
| 1 | Loewke et al.[15] | Both fresh ET and FET | Maternal age, morphokinetic parameters |
| 2 | Miyagi et al.[17] | Not mentioned | Maternal age, clinical parameters |
| 3 | Chen et al.[18] | Not applicable, only blastocyst assessment | Not applicable |
| 4 | Barnes et al.[19] | FET | Maternal age, morphokinetic parameters |
| 5 | Khosravi et al.[20] | Not applicable, differentiation between good & bad | Not applicable |
| 6 | Wang et al.[21] | Not applicable, only blastocyst assessment | Not applicable |
| 7 | Bormann et al. [22] | Not applicable, only blastocyst assessment | Not applicable |
| 8 | Bormann et al.[23] | Both fresh ET and FET | Not clear |
| 9 | Miyagi et al.[24] | Not mentioned | Morphological parameters, clinical parameters |
| 10 | Kanakasabapathy et al.[25] | Not applicable, only morphology assessment | Not applicable |
| 11 | Tran et al.[26] | Both fresh ET and FET | None |
| 12 | Diakiw et al.[27] | FET | None |
| 13 | Giscard d’Estaing et al.[28] | Both fresh ET and FET | Maternal age, morphokinetic parameters |
| 14 | Chavez-Badiola et al.[29] | Both fresh ET and FET | Maternal age, hours post insemination |
| 15 | Huang et al.[30] | FET | Maternal age, morphokinetic parameters |
| 16 | Diakiw et al.[31] | FET | None |
| 17 | Cimadomo et al.[32] | FET | None |
| 18 | Ma et al.[33] | FET | Maternal age, morphokinetic parameters |
| 19 | Onthuam et al.[34] | Not mentioned | Maternal age and istanbul grading scores (stage, ICM, and TE) |
| 20 | Goldmeier et al.[35] | Fresh ET | Maternal age, morphokinetic parameters |
| 21 | Geller et al.[36] | Not mentioned | None |
| 22 | Berntsen et al.[37] | Both fresh ET and FET | None |
| 23 | Enatsu et al.[38] | Not mentioned | Maternal age, clinical parameters |
| 24 | Liu et al.[39] | FET | Maternal age, clinical parameters |
| 25 | Yuan et al.[40] | FET | Morphokinetic, morphological and clinical parameters |
| 26 | Huang et al.[62] | Both fresh ET and FET | None |
ET: Embryo Transfer, FET: Frozen Embryo Transfer, ICM: Inner Cell Mass, TE: Trophectoderm.
Five studies do not clearly mention the embryo transfer strategy, as shown in Figure 10. Since the endpoints of these studies were the classification or grading of blastocysts, the embryo transfer strategy was not applicable.[18,21,20,22,25]

- Embryo transfer strategy employed in datasets. Seven studies included data from both fresh and frozen transfers, while five failed to mention the embryo transfer strategy.
Dataset Inclusion and Exclusion Criteria
In a study by Loewke, we recorded the inclusion of discarded or mislabelled blastocysts in the negative outcome class.[15] However, assuming negative outcomes for all discarded blastocysts might be flawed, as supernumerary blastocysts are often discarded without known outcomes, especially when patients withhold consent for freezing.
Four studies have failed to clarify the embryo transfer strategy, i.e., fresh or FET, employed in the datasets collected, further impacting the IVF cycle outcome.[17,24,36,38] Including such data has the potential to influence model training and overall performance.
Achieving a balance between inclusion and exclusion criteria is a pivotal factor in constructing a dataset that accurately represents the intended predictive endpoint.
Inclusion of Metadata in Training AI Model
Five studies that concentrated on assigning a grade to blastocysts or categorising them as good or poor consciously chose not to incorporate any metadata, as shown in Figure 11.[18,21,22,25,32] This decision aligns with the nature of their task, in which the role of metadata may not be considered significant.

- Different types of metadata used along with the blastocyst image. Out of 13 studies that used metadata, two used only maternal age. Notably, three studies used other clinical parameters apart from maternal age. ICM: Inner Cell Mass, TE: Trophectoderm.
Surprisingly, few studies that specifically target predicting embryo viability and implantation potential have omitted the use of patient metadata altogether.[17,26,31,36,37] However, a noteworthy study employed a comprehensive approach to train the AI model by incorporating a range of clinical parameters such as maternal age, number of embryo transfers, anti-Müllerian hormone concentration, day-3 blastomere number, grade on day 3, embryo cryopreservation day, ICM, TE, average diameter, and body mass index (BMI), etc. over and above the morphological parameters of the embryo as shown in Figure 11.[24] A more recent model also integrated maternal age with pseudo-features derived from Istanbul grading to enhance predictive performance.[34] Notably, automated extraction of morphometric parameters has demonstrated a positive association with implantation potential.[35]
Another recent study employed a deep learning pipeline using day-5 embryo images along with maternal age and pseudo-features derived from Istanbul grading scores (stage, ICM, and TE), combining them with self-supervised and Generative Adversarial network (GAN)-augmented models to enhance viability prediction. This approach reflects a growing trend toward hybrid models that merge visual and clinical data to improve predictive accuracy.[34]
Additionally, seven studies that utilise time-lapse images and videos also integrated embryo morphokinetics data into their model training, highlighting the importance that these parameters provide, as shown in Figure 11, recognising the valuable insights these parameters provide. This diversity in metadata utilisation highlights the need for standardised practices and considerations in optimising models for predicting implantation potential.
Among studies using metadata for AI training, three studies utilised maternal age with other clinical data, as shown in Figure 11.[29,38,39] Four studies relied on morphokinetic data.[22,23,28,37] This diverse approach showcases ongoing experimentation to boost AI models for predicting embryo outcomes, logically emphasising that the use of metadata should enhance the model accuracy.
Public Availability of Data
Of the 26 studies reviewed, a significant portion lacked transparency regarding the dataset availability. Specifically, 14 studies did not explicitly mention whether their datasets were accessible to other researchers. Among the remaining 12, seven studies indicated that anonymised datasets can be obtained upon request, reflecting a willingness to share data while ensuring participant privacy.[17,22,30,37,39,40] However, four studies explicitly denied public access to their embryo images and patient data, citing ethical and privacy reasons and limiting access to collaborators.[19,20,27,33] This disparity in data availability highlights the ongoing challenges of balancing data sharing with ethical considerations in reproductive research.
A consolidated overview of dataset characteristics, including but not limited to study type, dataset size, span of data collection, image resolution, timing of image capture, metadata inclusion, and clinical endpoints, is presented in Table 6, reflecting the wide heterogeneity observed across the 26 included studies.
| Title | Span of data collection | Dataset size | Image capture timing | Number of clinics involved | Image capture source | Image format | Types of camera used | Resolution of image | Number of focal planes | Metadata used by model ? | Dataset sampling (% negative vs % positive class) |
Et strategy | Ground truth | Public data availability |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF | 2011 to 2020 | 5010 | Not mentioned (D5) | 10 | Optical light microscope | Static | Optical light microscope. Make not mentioned. | 480 pixels x 480 pixels | 1 | No | 42% : 58% | FET | PGT-A result | Not mentioned |
| 2. Robust and generalizable embryo selection based on artificial intelligence and time-lapse image sequences | 2011 to 2019 | 115832 | Appx 108 to 140 hpi ( D5, D6) |
18 | Time lapse | Static | 2 Types - Embryoscope, Embryoscope Plus | 256 pixels x 256 pixels | 1 | Yes (morphokinetic annotations) | 71% : 29% | Mix | FH at 7 weeks | Restricted access - anonymized available on request |
| 3. Deep learning as a predictive tool for foetal heart pregnancy following time-lapse incubation and blastocyst transfer | 2014 to 2018 | 10638 | Not mentioned (D5) | 8 | Time lapse | Video | 2 Types - Embryoscope, Embryoscope Plus | Whole video | Multiple | No | 92% : 8% | Mix | FH at 7 weeks | Not mentioned |
| 4. Characterization of an artificial intelligence model for ranking static images of blastocyst stage embryos. | 2015 to 2020 | 5923 | D5, D6, D7 | 11 | Optical light microscope + TL | Static | 8 Types - Olympus IX71, Nikon Diaphot 300, Olympus IX73, Olympus IX70, Nikon Diaphot, NIKON Eclipse TE300, Nikon SMZ 800 (Stereo Zoom), Embryoscope | 224 pixels x 224 pixels, 112 pixels x 112 pixels, 56 pixels x 56 pixels | 1 | Yes (maternal age, and day of image capture) |
Not clear | Mix | FH 6 to 8 weeks | Not metioned |
| 5. A non-invasive artificial intelligence approach for the prediction of human blastocyst ploidy - a retrospective model development and validation study | 2012 to 2017 | 10378 | 110 hpi | 2 | Time lapse | Static | 2 Types - EmbryoScope, Embryoscope Plus | 500 pixels x 500 pixels | 1 | Yes (morphokinetic annotations, maternal age) | 57% : 43% | FET | FH 6 to 8 weeks | Not publicly available |
| 6. Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization | 2012 to 2017 | 1764 | 110 hpi | 1 | Time lapse | Static | 1 Type -Embryoscope TL | 500 pixels x 500 pixels | 7 | Yes (maternal age) | 50% : 50% | NA | live birth | Not publicly available |
| 7. Consistency and objectivity of automated embryo assessments using deep neural networks | Not mentioned | 742 | 113 hpi | 1 | Time lapse | Static | 1 Type - Embryoscope TL | Not mentioned | 1 | Yes (morphokinetic annotations) | Not clear | NA | Embryo grade | Not mentioned |
| 8. Performance of a deep learning based neural network in the selection of human blastocysts for implantation | Not mentioned | 3469 | 113 + - 0.5 hpi | 1 | Time lapse | Static | 1 Type - Embryoscope TL | 210 pixels x 210 pixels | 1 | Yes (morphokinetic annotations) | 52% : 48 % | Mix | FH at 6 weeks | Restricted access - anonymized available on request |
| 9. A machine learning system with reinforcement capacity for predicting the fate of an ART embryo | 2013 to 2017 | 854 | Not mentioned (D5, D6) | 1 | Time lapse | Static + video | 1 Type - Embryoscope TL | 1000 pixels x 1000 pixels | Multiple | Yes (morphokinetic annotations) | 29 % : 71% | FET | Live birth | Not mentioned |
| 10. Towards Automation in IVF: Pre-Clinical Validation of a Deep Learning-Based Embryo Grading System during PGT-A Cycles | 2013 to 2022 | 3604 | D5, D6, D7 | 1 | Time lapse | Video | 1 Type - Embryoscope TL | NA (Not available) | Multiple | No | Not clear | FET | live birth | Not mentioned |
| 11. A novel system based on artificial intelligence for predicting blastocyst viability and visualizing the explanation | 2011 to 2019 | 19342 | D5 or D6 | 1 | Optical light microscope | Static | Optical light microscope. Make not mentioned. | 480 pixels x 640 pixels | 1 | Yes (maternal age, clinical parameters) | 61% : 39% | Not mentioned | FHB | Not mentioned |
| 12. Embryo Ranking Intelligent Classification Algorithm (ERICA): artificial intelligence clinical assistant predicting embryo ploidy and implantation | 2015 to 2019 | 946 | 130.0 to 131.8 hpi (D6) | 3 | Optical light microscope | Static | 2 Types - Olympus IX71, Olympus IX73 | 640 pixels x 480 pixels or 807 pixels x 603 pixels | 1 | Yes (maternal age, hpi) | 44% : 56% | Not mentioned | βHCG | Not mentioned |
| 13. Feasibility of artificial intelligence for predicting live birth without aneuploidy from a blastocyst image | 2008 to 2017 | 160 | 115 hpi | 1 | Not mentioned | Static | Make not mentioned | 100 pixels x 100 pixels | 1 | No | 50% : 50% | FET | live birth | Not mentioned |
| 14. Development and evaluation of inexpensive automated deep learning-based imaging systems for embryology | Not mentioned | 542 | 113 hpi | 2 | Optical light microscope | Static | 1 Type - Proprietary optical system mounted on smart phone | Not mentioned | 1 | No | Not clear | NA | embryo grade | Not mentioned |
| 15. Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study | 2016 to 2020 | 17580 | Not Mentioned (D5) | 1 | Optical light microscope | Static | Optical light microscope. Make not mentioned. | 1024 pixels x 768 pixels | 2 | Yes ( maternal age, clinical parameters) | 64% : 36% | FET | live birth | Restricted access - anonymized available on request |
| 16. A Deep Learning Framework Design for Automatic Blastocyst Evaluation With Multifocal Images | 2017 to 2018 | 1025 | 116 + - 1 hpi | 1 | Time lapse | Static | 1 Type - ASTEC Time lapse - CCM-iBIS | 1280 pixels x 960 pixels | 11 | No | 51% : 49% | NA | embryo grade | Not mentioned |
| 17. Predicting a live birth by artificial intelligence incorporating both the blastocyst image and conventional embryo evaluation parameters |
2009 to 2017 | 5691 | 115h or 139h | 1 | Not mentioned | Static | Optical light microscope. Make not mentioned. | 50 pixels x 50 pixels | 1 | Yes ( maternal age, 10 clinical parameters) | 28%: 72% | Not Mentioned | live birth | Not mentioned |
| 18. Using deep learning to predict the outcome of live birth from more than 10,000 embryo data | 2018 to 2019 | 15434 | 105 hpi to 125 hpi (D5) | 1 | Time lapse | Static | 1 Type - Embryoscope Plus | 224 pixels x 224 pixels | 1 | No | 54% : 46 % | Mix | Live birth | Anonymyzed available on request |
| 19. Using Deep Learning with Large Dataset of Microscope Images to Develop an Automated Embryo Grading System | 2014 to 2018 | 171239 | 112 to 116 hours (Day 5) or 136 to 140 hours (Day 6) | 1 | Optical light microscope | Static (1 image - 1 Focal plane) | 1 Type - Optical Light Microscope (Zeiss Axio Observer Z1) | 264 pixels x 198 pixels | 1 | No | 43% : 57% | NA | Embryo grade | Not mentioned |
| 20. An artificial intelligence model (euploid prediction algorithm) can predict embryo ploidy status based on timelapse data | 2018 to 2019 | 1803 | D5 - 110-116 h D6 - 132.5-136 h | 1 | Time lapse | Video | 1 Type - Embryoscope Plus TL | Not specified (video) | Multiple | Yes (morphokinetic annotations, maternal age) | Not clear | FET | PGT-A result | Anonymyzed available on request |
| 21. Development of an artificial intelligence based model for predicting the euploidy of blastocysts in PGT-A treatments | 2019 to 2022 | 1036 | D5, D6 | 1 | Time lapse | Video | 1 Type - Embryoscope Plus TL | Not mentioned | Multiple | Yes (morphokinetic annotations, morphological parameters,maternal age) | 51% : 49% | NA | Embryo grade | Anonymyzed available on request |
| 22. An Artificial IntelligenceBased Algorithm for Predicting Pregnancy Success Using Static Images Captured by Optical Light Microscopy | 2016 to 2019 | 361 | Not Mentioned (D5) | 4 | Optical light microscope | Static | Optical light microscope. Make not mentioned. | 224 pixels x 224 pixels | 1 | No | 42% : 48% | Not Mentioned | FHB | Not applicable |
| 23. Development of an artificial intelligence model for predicting the likelihood of human embryo euploidy based on blastocyst images from multiple imaging systems during IVF | 2011 to 2020 | 1001 | Not Mentioned (D5, D6) |
10 | Optical light microscope + TL | Static | 6 Types - Cooper Surgical Saturn5, Saturn3, Vitrolife Octax, Hamilton Thorne LYKOS, EmbryoScope TL, Merck GERI TL | 480 pixels x 480 pixels | 1 | Yes ( maternal age) | 22% : 78% | FET | PGT-A result | Not publicly available |
| 24. Combined Input Deep Learning Pipeline for Embryo Selection for In Vitro Fertilization Using Light Microscopic Images and Additional Features | 2018 to 2022 | 1194 | Not Mentioned (D5) | 1 | Optical light microscope | Static | Optical light microscope. Make not mentioned. | 224 pixels x 224 pixels | 1 | Yes ( maternal age + Istanbul Grading Exp, ICM, TE) | 69% : 31% | Not Mentioned | FHB at 6 to 10 weeks | Anonymyzed available on request |
| 25. Enhancing clinical utility-deep learning-based embryo scoring model for non-invasive aneuploidy prediction | 2018 to 2021 | 3405 | D5,D6 | 2 | Time lapse | Video | 1 Type - Embryoscope Plus TL | Not specified (video) | Multiple | Yes (maternal age) | 44 % Aneuploid : 56% ( 45% Euploid + 12 % Mosaic) | FET | PGT-A result | Not applicable |
| 26. An artificial intelligence algorithm for automated blastocyst morphometric parameters demonstrates a positive association with implantation potential | 2014 to 2017 | 608 | D5 Variable (tEB) | 3 | Time lapse | Video | 1 Type - Embryoscope TL | 224 pixels x 224 pixels | Multiple | Yes (maternal age) | 67.1% : 32.9% | Fresh | FHB | Not publicly available |
Comparative summary of dataset characteristics extracted from 26 studies evaluating AI-based blastocyst assessment. The table includes parameters such as dataset size, span of data collection, number of clinics involved in data collection, type of study, public data availability, resolution of image, type of input to the AI model, number of focal planes, types of camera used, timing of image capture, day of image capture, embryo transfer strategy, metadata inclusion in AI model training, primary aim of blastocyst assessment, and the endpoint or ground truth used. This table is provided as a separate Excel file due to formatting constraints and is submitted as Table 6.
DISCUSSION
Importance of Dataset Size
The impact of the sample size on the performance of AI models for blastocyst assessment is significant and multifaceted. Although a larger sample size intuitively correlates with improved model performance, the quality and distribution of data within the dataset are of paramount importance.[43] Studies have shown that satisfactory model performance can be achieved even with a smaller sample size that contains high-quality data and is representative of the problem domain.[44] However, it is essential to recognise that there exists a point of diminishing returns where the effect sizes and ML accuracies plateau after reaching a certain sample size threshold.[45] This suggests that simply increasing the sample size may not always lead to significant improvements in model performance. Therefore, when evaluating sample size adequacy, it is crucial to consider the quantity, quality, and sampling strategy of data to ensure optimal model performance in blastocyst assessment studies, while also acknowledging the resource constraints involved in obtaining and annotating large datasets.
Importance of Diversity in Training Data
Diverse demographic dataset sources are crucial for robust and universal AI models for blastocyst assessment. Training on varied datasets goes beyond regional boundaries, ensuring adaptability to global populations and IVF laboratory settings. AI models benefit from capturing diverse blastocyst morphologies and morphokinetic trends owing to patient ethnicity, enhancing their applicability across different clinical settings worldwide.[46,47]
These findings set the stage for a broader discussion of the implications of dataset demographics, emphasising the need for standardised approaches in selecting datasets for AI model training.
In addition to data from diverse demographics, incorporating various image capture mechanisms is intuitively important. While training with diverse image-capturing mechanisms may seem important, there are strategies such as domain adaptation and domain generalisation that can be employed to address the issue of domain shift and enable seamless integration of these AI models into real-world clinical settings.
Domain adaptation enables models trained in one setting to perform well in others, whereas domain generalisation aims to maintain model performance across multiple domains without specific adaptation.[48,49] Personalised federated learning allows for the training of a shared model with data from multiple clinics, which can then be tailored to individual practices.[50] These strategies ensure that AI models are versatile and customisable in various clinical settings.
Importance of Using Multiple Focal Planes for Image Capture
Although it may seem intuitive that AI models for embryo assessment could benefit from using multiple focal planes to better visualise key characteristics, such as the ICM and TE, the current research does not provide a solid basis for this claim. In a recent abstract that explored the performance of an AI model trained on static images using a single focal depth, there was no statistically significant difference in model performance when applied to multifocal time-lapse images with different focal planes.[50,51] Another abstract mentioned that techniques like ensemble learning and test-time augmentation can also be used to reduce the sensitivity of AI models to different focal planes while maintaining model performance.[52]
However, a recent retrospective study of 2555 day 5 blastocysts showed that training the AI model with segmented images highlighting the ICM and TE improved its predictive performance, with the AUC increasing from 0.716 to 0.741.[41] The model was also able to focus on clinically relevant regions in 99% of cases, compared to 86% when trained on unsegmented images. This suggests that providing structured input emphasising biologically important regions can enhance the accuracy of AI models in embryo assessment.
Hence, we currently do not have sufficient evidence to suggest that multifocal images will be better for training AI models, and this needs further research with larger sample sizes.
Implications of Wider Range in Image Capture Timing
Blastocysts are highly metabolically active, and even a slight 3–5 hour gap in assessment timing has the potential to significantly impact their grading.[42,53] None of the reviewed papers, especially those using static blastocyst images, explicitly specified whether they incorporated the timing of image capture as temporal metadata along with the input image to train their AI models tasked with assessment and ranking blastocysts. It is worth noting that the Istanbul consensus recommends blastocyst assessment to be performed at 116 ± 1 hours post-insemination.[5]
The importance of this parameter becomes evident when considering scenarios such as comparing a grade 4AA blastocyst on day 5 with a similar grade 4AA blastocyst on day 6. Associating these with a similar score would be inaccurate. For optimal training of AI models in blastocyst assessment, the ideal dataset should use images captured within a consistent time frame, ensuring greater accuracy and comparability of the results.
Ideal Endpoint or Ground Truth for Datasets Used for Blastocyst Assessment
The choice of endpoint in blastocyst assessment studies depends heavily on the primary aim of assessment. When the goal is to rank and prioritise blastocysts for transfer, an ideal endpoint would be the detection of cardiac activity, as it provides a more immediate indicator of viability compared to live birth, which is influenced by numerous confounding factors.[54]
In studies aimed at predicting euploidy, the choice of endpoint is complex. Live birth serves as a comprehensive endpoint, encapsulating the ultimate goal of achieving a successful pregnancy. However, for cases resulting in abortion, assessing the genetic status of the conceptus using CVS can offer valuable insights. By doing so, we can gain a more accurate understanding of the predictive capabilities of the model.
Importance of Addressing Confounders in AI Model Training
Maternal factors such as very advanced maternal age (age > 40 years) and high BMI, and uterine factors such as adenomyosis, endometriosis, and the presence of uterine structural anomalies significantly affect the clinical pregnancy and live birth outcomes.[55–57] It’s crucial to acknowledge that neglecting these confounders during dataset preparation may introduce biases that affect the robustness of AI model learning.
Training an AI model without accounting for these confounders can negatively affect its performance. For example, if an AI model learns from cases where a blastocyst with favourable morphology does not result in pregnancy due to underlying uterine abnormalities, it may lead to incorrect generalisations. This failure to consider the broader clinical context can impair the AI model’s understanding of the true relationship between embryo quality and outcomes. Such misleading data are labelled as ‘noisy data’ and have the ability to impact model performance.[58]
Excluding confounders is vital not only for ensuring that the dataset represents clinical reality accurately but also for optimising the learning process of AI models, allowing them to capture the various factors associated only with blastocyst morphology that can influence reproductive outcomes.
Importance of Dataset Split or Class Distribution
Highlighting the significance of mirroring real-world scenarios in dataset class distribution, it is crucial to note that, typically, close to 40% of embryos are classified as good usable blastocysts, whereas close to 60% are categorised as poor blastocysts.[59] A recent study suggested that altering the class distribution from 30:70 to 50:50 does not significantly affect the AI model performance.[37] However, another study cautioned against oversampling the negative class (poor embryos), as it introduces a notable negative bias in the AI model.[15] Thus, advocating for a closer-to-reality class distribution of approximately 40% good and 60% poor seems logical for robust AI training.
Deciding on the correct dataset split for training AI models in blastocyst assessment is a critical consideration. Here is a concise overview of the pros and cons associated with a 50%–50% split and a more practical 40%–60% split.
50%–50% Split of Good and Bad Blastocyst Data
A 50%–50% split ensures a balanced representation of positive and negative outcomes, simplifying the learning environment. Evaluation metrics, such as accuracy and precision, are straightforward because of the equal class distribution. However, this approach may be misaligned with the prevalent blastocyst distribution in practical IVF labs, which is 30%–40% good usable blastocysts and 60–70% poor blastocysts.[59] This could potentially hinder the real-world applicability of the model. It may also impact the sensitivity-specificity balance of the model.
Sensitivity refers to the AI’s ability to correctly identify a good blastocyst when it is indeed good, whereas specificity denotes its ability to accurately identify a poor blastocyst when it is indeed poor. If the training data does not reflect real-world distribution, the AI model may become biased. For example, the model might develop a high sensitivity for detecting good blastocysts but exhibit lower specificity in identifying poor blastocysts. This imbalance arises because model predictions are influenced by the characteristics of the training data, leading to potential performance discrepancies between different classes.
40% Good – 60% Bad Split
Opting for a 40%–60% or 30%–70% split aligns more realistically with the actual prevalence of blastocyst grades in practical IVF labs, which is close to the competence value of a good blastocyst development rate for most IVF labs worldwide.[59] This enhances the relevance of the model to clinical practice by providing a distribution that better mirrors real-world scenarios. Additionally, a more representative split improves the model’s sensitivity for positive outcomes, capturing patterns related to the minority class (good blastocysts). However, this approach introduces challenges in model evaluation, particularly with respect to accuracy interpretation, and there is a potential for an increased risk of false positives.[60] It is important to note that an imbalanced class distribution, if present, can contribute to overfitting, where the model may become biased towards more prevalent grades during training, potentially impacting its generalisation to diverse blastocyst grades in real-world applications.
Importance of Embryo Transfer Strategy in Studies
In studies aiming to associate a predictive score with embryos or assess embryo viability during blastocyst assessment, the choice of embryo transfer strategy used during dataset collection has the potential to impact AI model performance. If the AI model is trained solely on fresh transfers, various confounders, like uterine receptivity and endometrial thickness at the time of transfer, may affect the ground truth. Failure to account for these confounders can introduce bias during model training. Therefore, advocating for an ideal strategy, it is suggested that using only FETs for dataset collection, avoiding a mix of fresh and FETs, could help mitigate confounding factors and enhance model robustness.
Importance of Using Clinical Metadata Along with Images
In the context of metadata utilised by models, it is essential to acknowledge that the implantation potential of an embryo is influenced by various factors, with maternal age being one of the most significant contributors.[61] Models aimed at predicting implantation or assigning a viability score can greatly benefit from incorporating patient age and other relevant clinical parameters into their analysis.
Proposed Characteristics of a Gold Standard Dataset
A gold standard dataset for AI-based blastocyst assessment should be designed to support robust training, validation, and testing of clinically applicable models. Based on insights from existing AI tools, such a dataset should ideally include between 50,000 and 100,000 high-quality images of expanded blastocysts, sourced from multiple clinics to ensure diversity and generalisability. A minimum threshold of 20,000 well-curated images may serve as a baseline when image quality, patient selection and imaging consistency are tightly maintained. These images should be divided into 70% for training, 15% for validation and 15% for testing the AI model.[62]
The dataset should consist exclusively of expanded blastocysts from autologous IVF cycles, with one embryo per image and no other embryos, microtools, or visual distractions in the frame. Each image must clearly display the entire blastocyst in sharp focus, with identifiable ICM, TE and zona pellucida. Image capture should be performed at a consistent time point between 110 and 116 hpi using a 20x objective lens on an inverted microscope, resulting in a total magnification of approximately 200 times. A minimum image resolution of 500 by 500 pixels should be maintained. The dataset should include patients aged between 23 and 38 years with normal BMI and exclude cases with uterine abnormalities such as endometriosis, adenomyosis, congenital anomalies, or recurrent implantation failure. Only frozen cycles should be included to eliminate variability due to endometrial receptivity. The ground truth for each image should be defined as the presence or absence of foetal cardiac activity between 6 and 8 weeks of gestation.
All images should be annotated with clinical outcome labels for classification purposes. In addition, a smaller subset of 500–2000 images should be structurally annotated using standard annotation tools to clearly mark the ICM, TE and zona pellucida. These detailed annotations are essential for training segmentation models or for developing region-focused explainable AI tools. A class distribution of approximately 40% positive and 60% negative outcomes should be maintained to reflect real world clinical conditions.
LIMITATIONS
This systematic review has few limitations, including the potential risk of bias in the selected studies. Not using established risk-of-bias assessment tools introduces uncertainty into the analysis. Despite efforts to mitigate bias through tailored dataset assessments, the absence of standardised tools warrants acknowledgement by relying on individual study quality and transparency.
Another limitation is the review’s exclusive focus on the English language literature, potentially introducing language bias. This choice may exclude relevant studies in other languages, thus limiting the representation of diverse perspectives and findings. Consideration of this bias is crucial for interpreting the comprehensive scope and generalisability of the review.
The lack of a standardised parameter set for comparing databases is a significant limitation. Most reviewed studies lack comprehensive dataset reporting and vary in the parameters covered. This reporting variability poses challenges to the synthesis of information across studies.
Additionally, the diverse endpoints pursued in the reviewed studies, ranging from ranking embryos to predicting live births, complicate the dataset comparison. Varying endpoints hinder direct parallels among datasets, adding complexity to the analysis.
This review was limited to various dataset characteristics and their importance in model training. Ideally, a comprehensive study would compare these characteristics and assess their impact on the AI model’s performance. However, this task is complex and challenging because of the diverse nature of datasets, variability in model architectures, and multifactorial interactions among these elements. Future studies should build on our findings and focus on quantifying how specific dataset parameters influence the model performance.
CONCLUSION
In conclusion, our systematic review emphasises the urgent need for standardised approaches to dataset preparation for AI models used for blastocyst assessment. Datasets should feature diverse patient demographics, balanced negative and positive sampling, uniform image capture timing, uniformity in endpoint, and removal of confounding factors that can directly impact the endpoint. Additionally, consistent embryo transfer strategies and standardised annotations of patient characteristics are essential to achieve reliable endpoints.
Restricted dataset accessibility poses challenges for future researchers and the time taken for model development. Our review highlights the urgent need for a standardised dataset of blastocyst images used for AI model training and testing. The availability of a gold standard dataset, such as the annotated human blastocyst dataset, would establish a common benchmark for researchers, facilitating accurate comparison of AI models and promoting standardised practices. Ultimately, this initiative would enhance collaboration, optimise model performance, and contribute to the evolution of AI-driven blastocyst evaluation.
Acknowledgement
I want to express my sincere gratitude to everyone who played a crucial role in the successful completion of this research. A special appreciation goes to my Ph.D. guide and mentors, whose unwavering guidance and invaluable insights have shaped this study.
I would like to acknowledge and thank the dedicated team at my clinic for their constant support and collaborative efforts, which have been pivotal to the execution of this research.
Author contributions:
DBP: Conceptualisation, search strategy design, primary data extraction, and manuscript drafting; HD: Data collection, literature screening, and support in data synthesis; SK: Quality assessment of included studies and critical revision of the manuscript; SKV: Interpretation of findings, guidance on methodological framework, and manuscript refinement. All authors reviewed and approved the final version of the manuscript.
Ethical approval:
Institutional Review Board approval is not required.
Declaration of patient consent:
Patient’s consent is not required as there are no patients in this study.
Financial support and sponsorship:
Nil.
Conflicts of interest:
There are no conflicts of interest.
Use of artificial intelligence (AI)-assisted technology for manuscript preparation:
The authors confirm that they have used artificial intelligence (AI)-assisted technology for assisting in the writing or editing of the manuscript or image creation. The AI tools utilised during the preparation of this systematic literature review are as follows: Open Knowledge Maps – For literature search, visualisation and text mining, Inciteful – For locating highly cited manuscripts, CiteFast – For automated citation generation, PaperPal – For structural verification and grammar check of the manuscript, Trinka – For checking the grammar of the manuscript, ChatGPT – For alphabetical ordering of references in the manuscript. After utilising the services of these tools, the author(s) thoroughly reviewed and edited the content as necessary, assuming full responsibility for the publication's content.
REFERENCES
- Medical Image Analysis Based on Deep Learning Approach. Multimed Tools Appl. 2021;80:24365-98.
- [Google Scholar]
- Culture and Selection of Viable Blastocysts: A Feasible Proposition for Human IVF? Hum Reprod Update. 1997;3:367-82.
- [Google Scholar]
- Assessment of Embryo Viability: The Ability to Select a Single Embryo for Transfer—A Review. Placenta. 2003;24:S5-S12.
- [Google Scholar]
- IVF/ICSI Twin Pregnancies: Risks and Prevention. Hum Reprod Update. 2005;11:575-93.
- [Google Scholar]
- The Istanbul Consensus Workshop on Embryo Assessment: Proceedings of an Expert Meeting. Hum Reprod. 2011;26:1270-83.
- [Google Scholar]
- Randomized Comparison of Two Different Blastocyst Grading Systems. Fertil Steril. 2006;85:559-63.
- [Google Scholar]
- The Clinical Use of Time-Lapse in Human-Assisted Reproduction. Ther Adv Reprod Health. 2020;14:263349412097692.
- [Google Scholar]
- Characterization of a Top Quality Embryo, a Step Towards Single-Embryo Transfer. Hum Reprod. 1999;14:2345-9.
- [Google Scholar]
- A Survey on Deep Learning in Medical Image Analysis. Med Image Anal. 2017;42:60-88.
- [Google Scholar]
- Recent Advances and Clinical Applications of Deep Learning in Medical Image Analysis. Med Image Anal. 2022;79:102444.
- [Google Scholar]
- Reporting on the Value of Artificial Intelligence in Predicting the Optimal Embryo for Transfer: A Systematic Review including Data Synthesis. Biomedicines. 2022;10:697.
- [Google Scholar]
- Image Processing Approach for Grading IVF Blastocyst: A State-of-the-Art Review and Future Perspective of Deep Learning-Based Models. Appl Sci. 2023;13:1195.
- [Google Scholar]
- Embryo Selection Through Artificial Intelligence Versus Embryologists: A Systematic Review. Hum Reprod Open. 2023;2023
- [Google Scholar]
- Data Set Quality in Machine Learning: Consistency Measure Based on Group Decision Making. Appl Soft Comput. 2021;106:107366.
- [Google Scholar]
- Characterization of an Artificial Intelligence Model for Ranking Static Images of Blastocyst Stage Embryos. Fertil Steril. 2022;117:528-35.
- [Google Scholar]
- The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. Syst Rev. 2021;10:89.
- [Google Scholar]
- Feasibility of Artificial Intelligence for Predicting Live Birth Without Aneuploidy from a Blastocyst Image. Reprod Med Biol. 2019;18:204-11.
- [Google Scholar]
- Using Deep Learning with Large Dataset of Microscope Images to Develop an Automated Embryo Grading System. Fertil Reprod. 2019;1:51-6.
- [Google Scholar]
- A Non-Invasive Artificial Intelligence Approach for the Prediction of Human Blastocyst Ploidy: A Retrospective Model Development and Validation Study. Lancet Digit Health. 2023;5:e28-e40.
- [Google Scholar]
- Deep Learning Enables Robust Assessment and Selection of Human Blastocysts After In Vitro Fertilization. NPJ Digit Med. 2019;2:21.
- [Google Scholar]
- A Deep Learning Framework Design for Automatic Blastocyst Evaluation With Multifocal Images. IEEE Access. 2021;9:18927-34.
- [Google Scholar]
- Consistency and Objectivity of Automated Embryo Assessments Using Deep Neural Networks. Fertil Steril. 2020;113:781-7.e1.
- [Google Scholar]
- Performance of a Deep Learning Based Neural Network in the Selection of Human Blastocysts for Implantation. Elife. 2020;15:9:e55301.
- [Google Scholar]
- Predicting a Live Birth by Artificial Intelligence Incorporating Both the Blastocyst Image and Conventional Embryo Evaluation Parameters. Artif Intell Med Imaging. 2020;1:94-107.
- [Google Scholar]
- Development and Evaluation of Inexpensive Automated Deep Learning-Based Imaging Systems for Embryology. Lab Chip. 2019;19:4139-45.
- [Google Scholar]
- Deep Learning as a Predictive Tool for Fetal Heart Pregnancy Following Time-Lapse Incubation and Blastocyst Transfer. Hum Reprod. 2019;34:1011-8.
- [Google Scholar]
- Development of an Artificial Intelligence Model for Predicting the Likelihood of Human Embryo Euploidy Based on Blastocyst Images from Multiple Imaging Systems During IVF. Hum Reprod. 2022a;37:1746-59.
- [Google Scholar]
- A Machine Learning System with Reinforcement Capacity for Predicting the Fate of an ART Embryo. Syst Biol Reprod Med. 2021;67:64-78.
- [Google Scholar]
- Embryo Ranking Intelligent Classification Algorithm (ERICA): Artificial Intelligence Clinical Assistant Predicting Embryo Ploidy and Implantation. Reprod Biomed Online. 2020;41:585-93.
- [Google Scholar]
- An Artificial Intelligence Model (Euploid Prediction Algorithm) Can Predict Embryo Ploidy Status Based on Time-Lapse Data. Reprod Biol Endocrinol. 2021;19:185.
- [Google Scholar]
- An Artificial Intelligence Model Correlated with Morphological and Genetic Features of Blastocyst Quality Improves Ranking of Viable Embryos. Reprod BioMed Online. 2022b;45:1105-17.
- [Google Scholar]
- Towards Automation in IVF: Pre-Clinical Validation of a Deep Learning-Based Embryo Grading System During PGT-A Cycles. J Clin Med. 2023;12:1806.
- [Google Scholar]
- Enhancing Clinical Utility: Deep Learning-Based Embryo Scoring Model for Non-Invasive Aneuploidy Prediction. Reprod Biol Endocrinol. 2024;22:58.
- [Google Scholar]
- Combined Input Deep Learning Pipeline for Embryo Selection for In Vitro Fertilization Using Light Microscopic Images and Additional Features. J Imaging. 2025;11:13.
- [Google Scholar]
- An Artificial Intelligence Algorithm for Automated Blastocyst Morphometric Parameters Demonstrates a Positive Association with Implantation Potential. Sci Rep. 2023;13:12345.
- [Google Scholar]
- An Artificial Intelligence-Based Algorithm for Predicting Pregnancy Success Using Static Images Captured by Optical Light Microscopy During Intracytoplasmic Sperm Injection. J Hum Reprod Sci. 2021;14:288-92.
- [Google Scholar]
- Robust and Generalizable Embryo Selection Based on Artificial Intelligence and Time-Lapse Image Sequences. PLoS One. 2022;17:e0262661.
- [Google Scholar]
- A Novel System Based on Artificial Intelligence for Predicting Blastocyst Viability and Visualizing the Explanation. Reprod Med Biol. 2022;21:e12443.
- [Google Scholar]
- Development and Evaluation of a Live Birth Prediction Model for Evaluating Human Blastocysts from a Retrospective Study. Elife. 2023;12:e83662.
- [Google Scholar]
- Development of an Artificial Intelligence Based Model for Predicting the Euploidy of Blastocysts in PGT-A Treatments. Sci Rep. 2023;13:2322.
- [Google Scholar]
- Improved Prediction of Clinical Pregnancy Using Artificial Intelligence with Enhanced Inner Cell Mass and Trophectoderm Images. Sci Rep. 2024;14:3240.
- [Google Scholar]
- Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain. Appl Sci. 2021;11:796.
- [Google Scholar]
- Impact of Quality, Type and Volume of Data Used by Deep Learning Models in the Analysis of Medical Images. Inform Med Unlock. 2022;29:100911.
- [Google Scholar]
- Evaluation of a Decided Sample Size in Machine Learning Applications. BMC Bioinform. 2023;24:48.
- [Google Scholar]
- Blastocyst Formation Rate for Asians Versus Caucasians and Within Body Mass Index Categories. J Assist Reprod Genet. 2020;37:933-43.
- [Google Scholar]
- A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis. Domain Adapt Represent Transf Afford Healthc AI Resour Divers Glob Health. 2021;12968:3-13.
- [Google Scholar]
- Domain Adaptation for Medical Image Analysis: A Survey. IEEE Trans Biomed Eng. 2021;69:1173-85.
- [Google Scholar]
- Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations Without Sharing Patient Data. Sci Rep. 2020;10:12598.
- [Google Scholar]
- P-306 Generalizable AI Model for Microscopic and Timelapse Multifocal Embryo Images. Hum Reprod. 2023;38:dead093.664.
- [Google Scholar]
- P-171 Sensitivity Analysis of an Embryo Grading Artificial Intelligence Model to Different Focal Planes. Hum Reprod. 2022;37:deac107.166.
- [Google Scholar]
- Assessment of Human Embryo Development Using Morphological Criteria in an Era of Time-Lapse, Algorithms and “OMICS”: Is Looking Good Still Important? Mol Hum Reprod. 2016;22:704-18.
- [Google Scholar]
- Cumulative Live Birth Rates and Number of Oocytes Retrieved in Women of Advanced Age. A Single Centre Analysis Including 4500 Women ≥38 Years Old. Hum Reprod. 2018;33:2010-7.
- [Google Scholar]
- Effect of Body Mass Index on IVF Treatment Outcome: An Updated Systematic Review and Meta-Analysis. Reprod BioMed Online. 2011;23:421-39.
- [Google Scholar]
- Effect of Endometriosis on IVF/ICSI Outcome: Stage III/IV Endometriosis Worsens Cumulative Pregnancy and Live-Born Rates. Hum Reprod. 2005;20:3130-5.
- [Google Scholar]
- Automated Detection of Poor-Quality Data: Case Studies in Healthcare. ProQuest. 2012;11:18005.
- [Google Scholar]
- Imbalanced Class Distribution and Performance Evaluation Metrics: A Systematic Review of Prediction Accuracy for Determining Model Performance in Healthcare Systems. PLOS Digit Health. 2023;2:e0000290.
- [Google Scholar]
- Effect of Maternal Age on the Outcomes of In Vitro Fertilization and Embryo Transfer (IVF-ET). Sci China Life Sci. 2012;55:694-8.
- [Google Scholar]
- Using Deep Learning to Predict the Outcome of Live Birth from More Than 10,000 Embryo Data. BMC Pregnancy and Childbirth. 2022;22
- [Google Scholar]

