In the world of data analysis, data generation plays a critical role in various fields such as machine learning, data mining, and artificial intelligence. However, collecting large amounts of real data can be time-consuming and expensive. Therefore, fake data generation using tools like the Mimesis module in Python can be an efficient alternative.
The Mimesis module is a powerful Python library for generating fake data of various types, including personal information, dates, addresses, and much more. The library is designed to provide realistic and accurate data for use in testing and development environments. It can also be used for data anonymization, data masking, and data augmentation.
To get started with Mimesis, we’ll need to install it using pip.
pip install mimesis
Installing specific version of mimesis Module.
pip install mimesis==4.1.3
In this article we are using 4.1.3 version of mimesis.
Once installed, you can import the module into your Python script and start generating fake data. For example, if you want to generate a fake name, you can use the below code:
# import Person class from mimesis from mimesis import Person # create a Person object person = Person() # generate a name name = person.full_name() print(name) #Output : 'Ezra Phelps'
This above code creates an instance of the Person class and uses the full_name() method to generate a random full name. You can modify the parameters of the Person() constructor to specify the language, gender, and other options.
Mimesis also provides a range of other data types that can be generated, such as dates, addresses, phone numbers, and email addresses. For example, to generate a random email address, you can use the below code:
# import Person class from mimesis from mimesis import Person # create a Person object person = Person() # generate random email email = person.email() print(email) #Output: '[email protected]'
This above code creates a Person instance and uses the email() method to generate a random email address. Similarly, you can use other methods to generate fake data of different types.
One of the great features of Mimesis is that it allows you to generate large amounts of data quickly and efficiently. For example, if you want to generate a list of fake names, you can use the below code:
# import Person class from mimesis from mimesis import Person # create a Person object person = Person() names = [person.full_name() for i in range(10)] print(names)
Output:
['Doreatha Decker', 'Edison Stokes', 'Willis Keller', 'Milan Aguilar', 'Wilhemina Phelps', 'Anjelica Mcdonald', 'Brinda Duncan', 'Marth Blevins', 'Levi Cameron', 'Rodger Booth']
This code creates a list of 10 random full names by calling the full_name() method inside a loop. You can adjust the range value to generate a larger or smaller list of names.
Now, we will see the complete code below for generating the fake personal data in json form.
# importing all required modules/functions/class import mimesis from mimesis.enums import Gender import json # creating a person object person_obj = mimesis.Person() # setting the number of records num_records = 10 # empty list to store the each records records_list = [] # looping for i in range(num_records) : # fetching the name of the male person person_name = person_obj.full_name(gender=Gender.MALE) # fetching the age of the person person_age = person_obj.age() # fetching the blood type of the person person_blood_type = person_obj.blood_type() # fetching the email of the person person_email = person_obj.email() # fetching the occupation of the person person_occupation = person_obj.occupation() # fetching the contact number of the person person_contact_num = person_obj.telephone() # setting the english time time_obj = mimesis.Datetime("en") # fetching the random time time = time_obj.datetime().strftime("%Y-%m-%d") # print the details of the person print("Details are:-", "Name:",person_name,\ "|Age:", person_age, \ "|Blood Type:", person_blood_type,\ "|Email id:", person_email,\ "|Occupation:", person_occupation,\ "|Contact Number:", person_contact_num,\ "|time:", time) # storing the details of the person in a dictionary object record_dict = {"name": person_name, "age": person_age, "blood":person_blood_type,"email": person_email, "occupation":person_occupation, "contact":person_contact_num, "time": time} # adding the dictionary in the list records_list.append(record_dict) # print(records_list) # creating json object from list of dicts json_records = json.dumps(records_list) # printing the json records print(json_records)
Output:
Details are:- Name: Jackson Norton |Age: 33 |Blood Type: O+ |Email id: [email protected] |Occupation: Gaming Board Inspector |Contact Number: 050.963.3729 |time: 2018-05-27 Details are:- Name: Jimmy Mejia |Age: 56 |Blood Type: B+ |Email id: [email protected] |Occupation: Editor |Contact Number: +1-(493)-502-3406 |time: 2014-01-22 Details are:- Name: Arlen Garner |Age: 51 |Blood Type: AB− |Email id: [email protected] |Occupation: Foster Parent |Contact Number: 1-896-578-5431 |time: 2008-10-08 Details are:- Name: Connie Hubbard |Age: 54 |Blood Type: A− |Email id: [email protected] |Occupation: Merchant Banker |Contact Number: (008) 148-2632 |time: 2003-11-23 Details are:- Name: Landon Chase |Age: 19 |Blood Type: AB− |Email id: [email protected] |Occupation: Goods Handler |Contact Number: 633-482-0357 |time: 2017-11-06 Details are:- Name: Theron Conrad |Age: 30 |Blood Type: A− |Email id: [email protected] |Occupation: Zoo Keeper |Contact Number: (157) 119-6992 |time: 2008-01-13 Details are:- Name: Keven Graham |Age: 58 |Blood Type: B+ |Email id: [email protected] |Occupation: Racing Organiser |Contact Number: 1-901-872-0091 |time: 2005-03-28 Details are:- Name: Wilfred Mcintyre |Age: 36 |Blood Type: AB− |Email id: [email protected] |Occupation: Pet Minder |Contact Number: +1-(756)-263-2409 |time: 2022-05-25 Details are:- Name: Eldon Richardson |Age: 52 |Blood Type: B+ |Email id: [email protected] |Occupation: Saw Miller |Contact Number: 011-350-3928 |time: 2006-06-21 Details are:- Name: Aubrey Compton |Age: 63 |Blood Type: B− |Email id: [email protected] |Occupation: Grave Digger |Contact Number: (796) 515-1329 |time: 2008-12-05 [{"name": "Jackson Norton", "age": 33, "blood": "O+", "email": "[email protected]", "occupation": "Gaming Board Inspector", "contact": "050.963.3729", "time": "2018-05-27"}, {"name": "Jimmy Mejia", "age": 56, "blood": "B+", "email": "[email protected]", "occupation": "Editor", "contact": "+1-(493)-502-3406", "time": "2014-01-22"}, {"name": "Arlen Garner", "age": 51, "blood": "AB\u2212", "email": "[email protected]", "occupation": "Foster Parent", "contact": "1-896-578-5431", "time": "2008-10-08"}, {"name": "Connie Hubbard", "age": 54, "blood": "A\u2212", "email": "[email protected]", "occupation": "Merchant Banker", "contact": "(008) 148-2632", "time": "2003-11-23"}, {"name": "Landon Chase", "age": 19, "blood": "AB\u2212", "email": "[email protected]", "occupation": "Goods Handler", "contact": "633-482-0357", "time": "2017-11-06"}, {"name": "Theron Conrad", "age": 30, "blood": "A\u2212", "email": "[email protected]", "occupation": "Zoo Keeper", "contact": "(157) 119-6992", "time": "2008-01-13"}, {"name": "Keven Graham", "age": 58, "blood": "B+", "email": "[email protected]", "occupation": "Racing Organiser", "contact": "1-901-872-0091", "time": "2005-03-28"}, {"name": "Wilfred Mcintyre", "age": 36, "blood": "AB\u2212", "email": "[email protected]", "occupation": "Pet Minder", "contact": "+1-(756)-263-2409", "time": "2022-05-25"}, {"name": "Eldon Richardson", "age": 52, "blood": "B+", "email": "[email protected]", "occupation": "Saw Miller", "contact": "011-350-3928", "time": "2006-06-21"}, {"name": "Aubrey Compton", "age": 63, "blood": "B\u2212", "email": "[email protected]", "occupation": "Grave Digger", "contact": "(796) 515-1329", "time": "2008-12-05"}]
In conclusion, fake data generation using the Mimesis module in Python is an efficient and effective way to generate large amounts of realistic and accurate data for testing and development purposes. The library provides a wide range of data types that can be generated with ease, making it an essential tool for data analysts and developers alike.