Generate data

Generate a dataset with the specified schema, examples and requirements. The model will learn from the provided examples and generate a dataset that matches the specified schema.

Arguments

schema_definition
dict

A dictionary which specifies the output dataset's schema. It must have the following format:
```
{
    "<col_1_name>": {"type": "<col_1_type>"},
    "<col_2_name>": {"type": "<col_2_type>"},
    ...
    "<col_n_name>": {"type": "<col_n_type>"}
}
```
where the possible values of <col_n_type> are integer, float and string.
examples
list[dict]

A list of dictionaries, which specifies a few (3 to 5 are enough) sample datapoints that will help the data generation model understand what the output data should look like. Each dictionary must follow the schema specified in the schema_definition parameter, or an exception will be raised.
requirements
list[str]

A list of strings, where each string specifies a requirement or constraint for the job. It must be an empty list if no specific requirements are present.
output_path
str

A string which specifies the path where the output dataset will be generated. It does not need to specify a file name, as this will be added automatically if one is not provided. If a file name is specified, its extension must be consistent with the output_type parameter. If this is the case, the provided output_path is used in its entirety. Otherwise, the provided extension is replaced with one that is consistent with output_type.
number_of_samples
int

An integer which specifies the number of datapoints that the model should generate. The maximum number of datapoints you can generate with a single job depends on whether you are on a free or paid plan.
output_type
str

A string which specifies the format of the output dataset. Only "csv" (meaning a .csv file will be generated) is supported at this time, but we will soon add more options.

Response

A Action Result object.

Python

from synthex import Synthex

client = Synthex()

client.jobs.generate_data(
    schema_definition = {
        "surface": {"type": "float"},
        "number_of_rooms": {"type": "integer"},
        "construction_year": {"type": "integer"},
        "city": {"type": "string"},
        "market_price": {"type": "float"}
    },
    examples = [
        {
            "surface": 104.00,
            "number_of_rooms": 3,
            "construction_year": 1985,
            "city": "Nashville",
            "market_price": 218000.00
        },
        {
            "surface": 98.00,
            "number_of_rooms": 2,
            "construction_year": 1999,
            "city": "Springfield",
            "market_price": 177000.00
        },
        {
            "surface": 52.00,
            "number_of_rooms": 1,
            "construction_year": 2014,
            "city": "Denver",
            "market_price": 230000.00
        }
    ],
    requirements = [
        "The 'market price' field should be realistic and should depend on the characteristics of the property.",
        "The 'city' field should specify cities in the USA, and the USA only"
    ],
    output_path = "output_data/output.csv",
    number_of_samples = 100,
    output_type = "csv"
)

200 Response

{
    "success": true,
    "message": "Job started successfully. Output will be saved to 'output_data/output.csv' upon completion.",
}