Successful text automation with high-quality product data

Requirements for high data and text qualities

With the Natural LanguageGeneration (NLG) software process, which generates natural language, content can be created automatically in a practical way. In this way, a large number of products or services can be described comprehensively and individually in a short time. However, in order for these product descriptions to be created in a targeted and accurate manner, sufficient structured and high-quality product data must be available. Because these form the basis of the automation to achieve a high text quality.

But even without supposedly perfect product data, it is worth taking a closer look at the topic of content automation, since source data can still be used for automation in many cases through various techniques. If a PIM system is already in use, the foundations for an optimal data structure have already been laid. But here, too, it becomes apparent time and again that a PIM system is not a mandatory prerequisite for automated text generation.

Requirements for data and text quality

Basically, four prerequisites are important to produce high-quality content from the available data using NLG:

  • Completeness. Of course, not all product information is relevant, but the attributes to be used in the text should be complete. The highest possible fill level of the attribute characteristics is crucial and ensures the information density as well as text quality.
  • Availability. The selection of the data source does not initially appear to have any influence on the data quality per se, but it should be determined beforehand in order to ensure the availability of the data. It has already been shown several times that the necessary product data for automation partly originates from different source systems and must first be aggregated before the data can be processed. For a fully automated workflow, a PIM system that is directly connected to the corresponding text engine is again a good choice.
  • Granularity. Furthermore, it is worthwhile to have the product data as granular as possible. The text can then be tailored more individually to the attributes and be less generic. In this way, prospective buyers can obtain the best possible information about the specific product, its features and benefits. Granularly listed data is also an advantage for improving the product search and thus leads to a higher conversion rate.
  • Consistency. Finally, a certain consistency of the product data is important. Special characters and different spellings as well as word types, for example, must be avoided here and must be picked up individually in the implementation of text automation. The corrections required as a result increase the complexity of the implementation and make it necessary to check the texts carefully as soon as new products with inconsistent attribute values are added. However, incorrect or inconsistent values can be quickly identified and corrected in the product data.

If these factors are given, the perfect conditions for automated text creation are present. But reality does not always look so ideal. Does this mean that NLG has to be dispensed with in this case?

Tricks and tricks of content automation

Even if the data situation does not appear optimal at first glance, workarounds can be created efficiently. Fallbacks, for example, are the key in the event of data failures. If a certain attribute is not filled in for a product, the sentence for this can either be skipped or - in order not to lose text length - replaced by a generically formulated variant in which the attribute is not used. Alternatively, it can be checked whether another variable is suitable as a replacement instead. If the existing attributes need to be described in more detail or if there are generally too few USPs to be gleaned from the data, a benefit search for products, attributes, proficiencies or clusters is used.

Even heterogeneously maintained data can be remedied by means of manipulations, by modifying individual cases in such a way that they can be integrated into the text flow. Here, however, it is important to avoid sources of error: If new products are added after the implementation is complete and an inconsistently maintained characteristic occurs for them that is not covered by the previous manipulations, it will not be caught accordingly. This feature must then be subsequently maintained in the engine in order to achieve good text quality.

Getting the best out of product data and NLG

Well-maintained data is indispensable for automated texts with high expressiveness and quality. The described criteria of completeness, availability, granularity, and consistency serve as orientation points for the assessment of data quality and individually automated text generation. The use of a PIM system is recommended for the appropriate data structure and an efficient automation process.

However, the following also applies: The ideal state is not always guaranteed, to which NLG can also react. With the help of fallbacks and manipulations, isolated data failures and inconsistent data can be intercepted. Thus, you still ensure a good reading flow and helpful product descriptions with an added value for your clientele. It goes without saying that this development then has a positive effect on the conversion rate. After all, only a satisfied clientele leads to more sales.

How we can support you

In order to create automated content with added value for your clientele from product data, various competencies and skills are required. We would be happy to advise you together with our solution partner hmmh AG. hmmh supports you with proven experts who have the technical know-how in dealing with automation platforms and PIM systems. Only in this way can the quality of the texts convince your target group. We look forward to hearing from you.