Generate data
Generate a dataset with the specified schema, examples and requirements. The model will learn from the provided examples and generate a dataset that matches the specified schema.
Arguments
- schema_definition dict
A dictionary which specifies the output dataset's schema. It must have the following format:where the possible values of{
"<col_1_name>": {"type": "<col_1_type>"},
"<col_2_name>": {"type": "<col_2_type>"},
...
"<col_n_name>": {"type": "<col_n_type>"}
}<col_n_type>
areinteger
,float
andstring
. - examples list[dict]
A list of dictionaries, which specifies a few (3 to 5 are enough) sample datapoints that will help the data generation model understand what the output data should look like. Each dictionary must follow the schema specified in theschema_definition
parameter, or an exception will be raised. - requirements list[str]
A list of strings, where each string specifies a requirement or constraint for the job. It must be an empty list if no specific requirements are present. - output_path str
A string which specifies the path where the output dataset will be generated. It does not need to specify a file name, as this will be added automatically if one is not provided. If a file name is specified, its extension must be consistent with theoutput_type
parameter. If this is the case, the providedoutput_path
is used in its entirety. Otherwise, the provided extension is replaced with one that is consistent withoutput_type
. - number_of_samples int
An integer which specifies the number of datapoints that the model should generate. The maximum number of datapoints you can generate with a single job depends on whether you are on a free or paid plan. - output_type str
A string which specifies the format of the output dataset. Only"csv"
(meaning a .csv file will be generated) is supported at this time, but we will soon add more options.
- Python
from synthex import Synthex
client = Synthex()
client.jobs.generate_data(
schema_definition = {
"surface": {"type": "float"},
"number_of_rooms": {"type": "integer"},
"construction_year": {"type": "integer"},
"city": {"type": "string"},
"market_price": {"type": "float"}
},
examples = [
{
"surface": 104.00,
"number_of_rooms": 3,
"construction_year": 1985,
"city": "Nashville",
"market_price": 218000.00
},
{
"surface": 98.00,
"number_of_rooms": 2,
"construction_year": 1999,
"city": "Springfield",
"market_price": 177000.00
},
{
"surface": 52.00,
"number_of_rooms": 1,
"construction_year": 2014,
"city": "Denver",
"market_price": 230000.00
}
],
requirements = [
"The 'market price' field should be realistic and should depend on the characteristics of the property.",
"The 'city' field should specify cities in the USA, and the USA only"
],
output_path = "output_data/output.csv",
number_of_samples = 100,
output_type = "csv"
)
- 200 Response
{
"success": true,
"message": "Job started successfully. Output will be saved to 'output_data/output.csv' upon completion.",
}