Welcome! I hope you have some time to spare as this article is quite not shorty. It might be a good idea to grab a big cup of coffee and some snacks before diving in this part of computer vision. We’ll explore the reasons and occasions for synthetic image generation, as well as the software tools that can be used for it. We’ll guide you through the process of creating a sample dataset. If you’re interested, you can download this dataset to experiment with neural network development.
Preface
As humans, we have always tried to imbue machines with some form of intelligence, attempting to transfer our own meaning and logic into them. This endeavor dates back to very long time in the history and while it was often propelled by excessive promotion, it has now evolved into a legitimate field with long-term prospects.
Today, we give away our data almost effortlessly. As the internet became more widespread, we have posted an infinite amount of images and photos, often accompanied by annotations, providing a wealth of information for artificial intelligence to learn from.
And it’s not just images – our shopping habits, location, emotional state, and countless other personal information are also being collected. The insatiable appetite of social media and the Internet consumes all. In return, their many artificial intelligence products help our lives every day.
However, the industry still holds onto its secrets. They rely solely on their own data, measured on site and kept under strict confidentiality. Widely available neural networks are incapable of recognizing these products, as they have not been trained on any similar images.
And sometimes these images don’t even exist. Even if a product has been in production for a long time, there may not be a lot of image data available. This has led to a reliance on artificially generated images for training models, and only checking them on real images.
Thankfully, we have reached a point in technological development where creating realistic images has become an available option for anyone. A wide array of graphics programs, both in 2D and 3D, are available to the general public. You don’t have to be a professional graphic designer to create an image, but knowledge of the program and the required domain is necessary.
What is synthetic image generation and why is it good for us?
Synthetic image data has become increasingly popular across various fields in recent years. Instead of relying on physical devices like cameras, synthetic data generation uses software or algorithms to create computer-generated images. This approach offers several advantages over traditional image data collection methods.
One of the biggest advantages is the greater control over image properties. With synthetic data generation, we have control over the entire data set, including lighting, object placement, and object variation. This means that we can create a dataset with specific characteristics that we need for a particular task.
Another advantage is the ability to generate large datasets quickly and at low cost. While it can be challenging to produce a synthetic image from scratch, if we already have a ready-made model for it, generating additional images can be infinitely cheaper than the production of real images. This allows researchers and developers to train and test their algorithms on much larger datasets, which can lead to more accurate results.
However, there are also some challenges associated with synthetic data generation. For example, it can be challenging to create truly realistic images that accurately reflect real-world scenarios. It requires a deep understanding of the domain and the ability to use software that can create high-quality images.
Despite these challenges, synthetic image data generation has enormous potential in a wide range of applications. As technology continues to advance, we can expect to see even more sophisticated methods for generating synthetic data, which will further expand the possibilities.
Before I get to the part about when to generate synthetic images, I would like to point out one of the big problems of AI. The data and its annotation. Sorting and annotating images is an incredibly time-consuming task (and I think it’s even boring – veryveryvery… boring), even with the best annotation algorithms, such as Meta’s latest invention SAM (Segment anything model). Moreover, it is done by people who can make mistakes. Our MInD platform is also prepared for multi-user handling that check each other’s annotations. However, this will only make the process even slower.
This is where the synthetically generated data is very useful, since when the data is created, I know exactly what is in the image and where it is. When the image is generated, I can create masks immediately, I can place annotation information in the name of the image (or even in a separate csv / json file).
It is a general truth that if you generate an image, you should immediately annotate it!
If you do all this right, the whole annotation process can be skipped, which means saving an amazing amount of time.
When should images be generated?
The simplest answer is when they are needed. However, there are four areas where synthetic images can be particularly useful. The first and second are actually almost the same.
The first is when there is no image available. For example, in the case of proof of concept or machines with zero kilometers, where only a very few measurements have been taken. Synthetic images can be created based on these small amount of measurements, which can be used to train neural networks before the machine is even ready.
The second area is when there are not enough images. The industry generally requires 99.99% accuracy or better, which can require a minimum of 10,000 images. However, clients may not be able to provide enough images due to various reasons such as lack of storage or a lengthy measurement process. In such cases, synthetic images can be generated to supplement the dataset.
The third area is due to data balance. In the production process, the industry strives to ensure that the product is defect-free. This can result in an imbalance of images with only a few containing visible defects. To train the neural network effectively, it is necessary to produce a synthetic image with as many defects as there are images without defect, maintaining the balance of the dataset.
From a manufacturer’s perspective, this situation can be quite challenging as it becomes extremely difficult, if not impossible, to produce a faulty product intentionally. In other words, if a manufacturer wants to create a product with specific types of defects, it becomes a challenging task due to the system’s strict quality control and error detection capabilities.
The fourth area where synthetic images can be useful is when it is difficult to annotate images. This can be the case when the image produced by the measurement is too complex or when no known algorithm can help with annotation. In this case, it may be worth producing the image artificially with appropriate annotations, especially when dealing with thousands of images.
What should you pay attention to when generating synthetic images?
Generating synthetic images is not a straightforward process. To create realistic images, you need to have the right tools and knowledge of appropriate programs. Moreover, it’s essential to add a diverse range of variations to the images; otherwise, the model trained on them will quickly overfit.
When generating synthetic images, there are several factors to consider, such as lighting, camera angle, object placement, and the level of detail required. It’s also crucial to pay attention to the dataset’s balance, as the model needs to be trained on a wide variety of images to learn effectively.
One essential aspect to keep in mind when generating synthetic images is the image’s realism. If the synthetic images don’t look realistic, they may not be suitable for training a machine learning model. This is because the model may not recognize real-world images if the training images are too different from them.
What softwares are worth using?
When it comes to generating synthetic images, the choice of software can be overwhelming. There are numerous programs available, each with its own set of features and capabilities. So, which ones should you use?
The answer is simple: use what you know.
The software you use is just a tool, and it will only work as well as you understand it. If you are already proficient in a particular program, stick with it. However, creating a complex synthetic image may require knowledge of several programs, as no single program can do everything.
For MInD, we use a variety of software tools, depending on the specific needs of the project.
2D and 3D a little
When it comes to 2D, I think Adobe is the first thing that comes to mind.
Adobe is a well-established player in the field of media software and their product line includes a wide range of tools that are worth considering when generating synthetic images, or creating resources for other programs. From Adobe Photoshop, which is the industry standard for image editing, to Adobe Illustrator, a vector graphics editor, and even the Substance product line for procedural texture creation, Adobe offers a variety of options for generating images you need in the pipeline.
One of the benefits of using Adobe products is the seamless data flow between them. Completed work can be easily transferred from one software to another, streamlining the workflow and saving time.
While Photoshop is a go-to tool for many image creators, other options such as Gimp, which is a free and open-source image editor, can also be used to create synthetic images.
In addition to traditional desktop software, there are also mobile options such as Procreate, which is a popular drawing app available on iPad. This can be a convenient option for drawing textures on a 2D surface or a 3D object in a more relaxed setting like a garden swing.
When it comes to post-processing of rendered images, Adobe products can be quite useful. However, it’s important to carefully consider what elements should be permanently included in the synthetic images versus those that can be adjusted at runtime through augmentation techniques like rotation, brightness and contrast adjustment, among others. The good news is that MInD provides a comprehensive range of options for image augmentation during the training process. This allows for greater flexibility in fine-tuning the training data to ensure optimal results.
3D
Creating 3D models requires specialized software, and there are many options available today like 3D Studio Max or Maya. If cost is a concern, Blender is a great free alternative. Lately, I’ve been exploring Houdini for its powerful procedural capabilities, which I hope to use to create diverse and interesting objects. Most of these programs offer free trial versions, so I recommend trying out several modeling, texturing, and rendering processes to find the one that suits your needs and preferences. Ultimately, the software you choose should be the one that you’re most comfortable using.
The key factor to consider when generating synthetic images is not the creation of a AAA-quality movie, but rather the generation of multiple high-quality images of a particular object. Therefore, the choice of 3D software is not a critical factor since all of them provide the ability to model, texture, and render images. Ultimately, it is difficult to go wrong with any of the available options.
When it comes to 3D modeling, it’s worth trying out sculpting software such as ZBrush, or even the Nomad app on an iPad. However, most 3D modeling programs now come with built-in sculpting options that are definitely worth exploring.
Photogrammetry is a fascinating technique for creating textured 3D models from photographs, even those taken with a phone camera. While it may not be straightforward in some cases, photogrammetry can offer immense convenience as it eliminates the need for manual modeling of the object. For starters, the RealityScan app can be downloaded to try it out on a mobile device. If you want to take it a notch higher, paid programs such as Autodesk Recap and RealityCapture are also available. It’s worth noting that the final model may have some imperfections, but most 3D modeling programs offer good retopology options to simplify and clean up the captured model.
Rendering is the final stage of the 3D process and it’s a complex topic that can fill hundreds of pages. In our case, I can divide it into three main parts.
The first option is CPU rendering, which is supported by all 3d programs. While it is the simplest method, it can handle everything from simple materials to complex light refractions and shading methods. This is a great option for models with intricate physical characteristics, but the rendering time can take a while depending on the hardware.
GPU rendering is the second option, which may not be integrated into the software by default. You can check if your chosen software has GPU rendering built-in, or you can use an external plugin such as Redshift or Octane. These are significantly faster than CPU rendering, which is especially beneficial when generating thousands of images. It’s a great option for models with unique physical characteristics.
Lastly, there are real-time renderers such as the Unreal Engine, Unity, and NVidia’s Omniverse package. While they can render dozens of images in just a few seconds, they may not accurately represent complex physical parameters. However, this is usually not an issue for most projects.
OpenCV
OpenCV is an extremely powerful tool that should not be underestimated when it comes to computer vision and image processing. However, it is not as user-friendly as some of the other programs mentioned earlier. To use OpenCV to its full potential, you must have a solid understanding of programming and be familiar with the features and capabilities of this graphics package.
OpenCV can be used for a wide range of tasks, from simple image filtering to advanced computer vision algorithms. It provides a variety of functions for image manipulation, feature detection, object recognition, and more. Additionally, OpenCV is compatible with several programming languages, including Python, C++, and Java.
In addition, using OpenCV we can generate unique and random mathematical patterns that often come in handy when generating synthetic images.
Finally, OpenCV is used for post-processing of rendered images, if any further adjustments are needed after the 3D modeling process.
Synthetic example
This particular example is a simplified version of a more complex request where the robot had to find the top of deformable packages for smooth gripping. As a bonus, we included the order in which the bags should be removed according to the current state. Of course, this would be re-evaluated after each bag intake.
The initial stage involves the analysis of one or more camera images. It is essential to consider factors such as resolution and lighting to determine the level of detail required in the synthetic image and the key elements that need highlighting. For our project, we used a 1024×1024 resolution image, which was also rendered this way in the example.
I will demonstrate how to create a candy bag prototype with the major steps, without going into specific details.
Before we proceed, it’s important to have a basic plan of what we will require from the rendering. To begin with, we will need a colored image displaying the sugar bags. It can be a jpg file. For each bag, we will also require a mask image that includes an ID of some sort, allowing us to distinguish between each bag. Lastly, a merged image of these masks will be necessary. In the case of a mask image, my general rule is to use lossless compression, e.g. a png file.
In order to help the postprocess work, we write in the name of the color image file how many bags should be visible in the image.
So now that we know what the task is, let’s get down to it!
MODO 3D
The initial step was to create a 3D model. I use Modo from Foundry, but any 3D modeling software can be used. The objective is to generate a model that can be converted into numerous polygons via a SubDivision operation. UV coordinates of the 3D model were also created to enable texturing.
UVs, in 3D modeling, are two-dimensional coordinates used to map a texture onto the surface of a 3D object. They connect the vertices of a mesh to specific points on an image texture, allowing the texture to be applied accurately to the surface of the object.
Photoshop
Next, the texture on the 3D model was created using Photoshop. Several images were generated in DreamStudio, resulting in a three-legged creature that the child liked, so it was incorporated into the design. LoremIpsum text and a fake barcode were added to complete the texture. To prepare for realtime rendering, the textures were distorted to 2048×2048 and included a normal texture to simulate surface unevenness.
Back to MODO 3D
Once the texture was ready, the next step was to test it on the 3D model. This is a crucial step as it allows you to see how the texture looks on the model and identify any potential issues.
During the testing phase, I made adjustments to the texture as needed to ensure that it looked as desired on the model, especially where the top and bottom of the texture meet. I also adjusted the normal map to ensure that the surface unevenness looked natural.
To add some variety to the candy bag, I decided to sculpt some bumps and depressions onto the model using the built-in tools in Modo. I ended up making five different versions to provide enough variety for this example project. This process was relatively simple and allowed me to add some subtle differences to each bag without having to create entirely new models.
By creating the 3D models with added bumps and depressions I simplified my upcoming tasks significantly. These models could now be treated as static meshes, and the only thing that needed to be examined was the collision between them. There was no need to run a soft body simulation, which would have made the task much more complex and extended the rendering time. This approach made the project more efficient and less time-consuming.
Although I carried out several model distortions such as bending and twisting for the real project we made, I skip them in this example. Nevertheless, I believe the gist of the idea can be seen.
Unreal Engine
After completing the models, I imported them into Unreal Engine and created a collision mesh that was accurate enough.
I’ve enhanced the collision detection rate between frames to ensure more efficient performance. With 60 frames per second, there’s a 16-millisecond interval between two frames. From a collision test viewpoint, that’s a considerable duration. During this period, two bags might collide, and they may remain entangled. You may have observed such model twitching during gameplay.
It’s important to note that there’s a compromise between precision and speed, where you can only choose one and the other will be affected.
For solving the task, I utilized the stable and long-standing version 4.27 of the Unreal engine, although working in version 5 would probably not cause significant obstacles.
It’s time to begin constructing the world. For this scene, a random number of bags need to be generated in mid-air and allowed to drop, creating a basic simulation that significantly lengthens the process of creating each image as I must wait about 2-3 seconds for the bags to settle into place. Also, a floor must be added for the bags to fall on.
To capture the images needed for the task, two cameras are required including one for the color image and another for the masks. These cameras are used for special RenderTarget capturing and the images are only taken at the precise moment when the bags have fallen into place, after which the resulting images are saved.
Four additional collision walls are required to concentrate the falling bags in a specific area.
The lighting process is complete using an HDRI cubemap – a specialized image that illuminates the scene based on color and pixel intensity, obtained from Greg Zaal’s work downloaded from https://polyhaven.com/a/paul_lobe_haus. To enhance the scene, two additional direct lights and two pointlights were added, with the latter being randomly positioned and adjusted in brightness before each generated image, adding variety to the resulting images.
And then let’s code. In other words, let’s connect the nodes together.
In the Unreal engine, there are two options for programming: using Blueprint, where you can connect nodes, or using C++, which is a more complex programming language. For this particular project, the decision was made to use Blueprint due to its simplicity.
The process of the project can be visualized in the image below, where the different nodes and connections are shown. After pressing the play button, this program controls the entire image generation process. About 20-25 pictures are taken in one minute (depending on the number of bags per image), which is much faster than having to photograph the pictures in reality.
Over the course of a few hours, a total of 15,000 unique images were generated, each accompanied by the corresponding masks. This dataset is available on the kaggle by clicking on the following link: https://www.kaggle.com/datasets/machineint/candybag
In the next post dealing with this, we will explain how we approach and solve this task in MInD.
Finally, here is a sample of a generated color image
Concluding remarks
In conclusion, it’s important to understand that there are no shortcuts or magic wands to mastering these skills. It takes time and practice to become proficient, and relying solely on automated tools is not enough.
Have fun exploring!
Kaggle Dataset link: https://www.kaggle.com/datasets/machineint/candybag

Founder of Machine Intelligence Zrt. and Ai-Fusion Kft.
Software developer, with 10+ years of experience in industrial user interfaces. Former DTP operator with the knowledge of many 2D and 3D apps.
Experience in small team (5-10 people) management.