With extra iterations, we are able to see extra modes (since extra occurrences of the outlier are rarer), however all the boldness intervals are fairly shut.
Within the case of bootstrap, including extra iterations does not result in overfitting (as a result of every iteration is unbiased). I might give it some thought as growing the decision of your picture.
Since our pattern is small, operating many simulations does not take a lot time. Even 1 million bootstrap iterations take round 1 minute.
Estimating customized metrics
As we mentioned, bootstrap is helpful when working with metrics that aren’t as simple as averages. For instance, you may need to estimate the median or share of duties closed inside SLA.
You may even use bootstrap for one thing extra uncommon. Think about you need to give clients reductions in case your supply is late: 5% low cost for quarter-hour delay, 10% — for 1 hour delay and 20% — for 3 hours delay.
Getting a confidence interval for such circumstances theoretically utilizing plain statistics may be difficult, so bootstrap might be extraordinarily helpful.
Let’s return to our operating program and estimate the share of refunds (when a buyer ran 150 km however did not handle to complete the marathon). We are going to use an identical operate however will calculate the refund share for every iteration as an alternative of the imply worth.
import tqdm
import matplotlib.pyplot as pltdef get_refund_share_confidence_interval(num_batches, confidence = 0.95):
# Working simulations
tmp = []
for i in tqdm.tqdm(vary(num_batches)):
tmp_df = df.pattern(df.form[0], change = True)
tmp_df['refund'] = listing(map(
lambda kms, handed: 1 if (kms >= 150) and (handed == 0) else 0,
tmp_df.kms_during_program,
tmp_df.finished_marathon
))
tmp.append(
{
'iteration': i,
'refund_share': tmp_df.refund.imply()
}
)
# Saving information
bootstrap_df = pd.DataFrame(tmp)
# Calculating assured interval
lower_bound = bootstrap_df.refund_share.quantile((1 - confidence)/2)
upper_bound = bootstrap_df.refund_share.quantile(1 - (1 - confidence)/2)
# Making a chart
ax = bootstrap_df.refund_share.hist(bins = 50, alpha = 0.6,
shade = 'purple')
ax.set_title('Share of refunds, iterations = %d' % num_batches)
plt.axvline(x=lower_bound, shade='navy', linestyle='--',
label='decrease certain = %.2f' % lower_bound)
plt.axvline(x=upper_bound, shade='navy', linestyle='--',
label='higher certain = %.2f' % upper_bound)
ax.annotate('CI decrease certain: %.2f' % lower_bound,
xy=(lower_bound, ax.get_ylim()[1]),
xytext=(-10, -20),
textcoords='offset factors',
ha='heart', va='high',
shade='navy', rotation=90)
ax.annotate('CI higher certain: %.2f' % upper_bound,
xy=(upper_bound, ax.get_ylim()[1]),
xytext=(-10, -20),
textcoords='offset factors',
ha='heart', va='high',
shade='navy', rotation=90)
plt.xlim(-0.1, 1)
plt.present()
Even with 12 examples, we obtained a 2+ occasions smaller confidence interval. We will conclude with 95% confidence that lower than 42% of consumers might be eligible for a refund.
That is end result with such a small quantity of knowledge. Nonetheless, we are able to go even additional and attempt to get an estimation of causal results.
Estimation of results
We’ve information concerning the earlier races earlier than this marathon, and we are able to see how this worth is correlated with the anticipated distance. We will use bootstrap for this as effectively. We solely want so as to add the linear regression step to our present course of.
def get_races_coef_confidence_interval(num_batches, confidence = 0.95):
# Working simulations
tmp = []
for i in tqdm.tqdm(vary(num_batches)):
tmp_df = df.pattern(df.form[0], change = True)
# Linear regression mannequin
mannequin = smf.ols('kms_during_program ~ races_before', information = tmp_df).match()tmp.append(
{
'iteration': i,
'races_coef': mannequin.params['races_before']
}
)
# Saving information
bootstrap_df = pd.DataFrame(tmp)
# Calculating assured interval
lower_bound = bootstrap_df.races_coef.quantile((1 - confidence)/2)
upper_bound = bootstrap_df.races_coef.quantile(1 - (1 - confidence)/2)
# Making a chart
ax = bootstrap_df.races_coef.hist(bins = 50, alpha = 0.6, shade = 'purple')
ax.set_title('Coefficient between kms throughout this system and former races, iterations = %d' % num_batches)
plt.axvline(x=lower_bound, shade='navy', linestyle='--', label='decrease certain = %.2f' % lower_bound)
plt.axvline(x=upper_bound, shade='navy', linestyle='--', label='higher certain = %.2f' % upper_bound)
ax.annotate('CI decrease certain: %.2f' % lower_bound,
xy=(lower_bound, ax.get_ylim()[1]),
xytext=(-10, -20),
textcoords='offset factors',
ha='heart', va='high',
shade='navy', rotation=90)
ax.annotate('CI higher certain: %.2f' % upper_bound,
xy=(upper_bound, ax.get_ylim()[1]),
xytext=(10, -20),
textcoords='offset factors',
ha='heart', va='high',
shade='navy', rotation=90)
# plt.legend()
plt.xlim(ax.get_xlim()[0] - 5, ax.get_xlim()[1] + 5)
plt.present()
return bootstrap_df
We will have a look at the distribution. The boldness interval is above 0, so we are able to say there’s an impact with 95% confidence.
You’ll be able to spot that distribution is bimodal, and every mode corresponds to one of many situations:
- The element round 12 is said to samples with out an outlier — it is an estimation of the impact of earlier races on the anticipated distance throughout this system if we disregard the outlier.
- The second element corresponds to the samples when one or a number of outliers have been within the dataset.
So, it is tremendous cool that we are able to make even estimations for various situations if we have a look at the bootstrap distribution.
We have realized methods to use bootstrap with observational information, however its bread and butter is A/B testing. So, let’s transfer on to our second instance.
The opposite on a regular basis use case for bootstrap is designing and analysing A/B assessments. Let’s take a look at the instance. It should even be based mostly on an artificial dataset that exhibits the impact of the low cost on buyer retention. Think about we’re engaged on an e-grocery product and need to take a look at whether or not our advertising marketing campaign with a 20 EUR low cost will have an effect on clients’ spending.
About every buyer, we all know his nation of residence, the variety of members of the family that stay with them, the typical annual wage within the nation, and the way a lot cash they spend on merchandise in our retailer.
Energy evaluation
First, we have to design the experiment and perceive what number of purchasers we want in every experiment group to make conclusions confidently. This step known as energy evaluation.
Let’s shortly recap the essential statistical idea about A/B assessments and fundamental metrics. Each take a look at relies on the null speculation (which is the present established order). In our case, the null speculation is “low cost doesn’t have an effect on clients’ spending on our product“. Then, we have to acquire information on clients’ spending for management and experiment teams and estimate the chance of seeing such or extra excessive outcomes if the null speculation is legitimate. This chance known as the p-value, and if it is sufficiently small, we are able to conclude that now we have sufficient information to reject the null speculation and say that therapy impacts clients’ spending or retention.
On this method, there are three fundamental metrics:
- impact measurement — the minimal change in our metric we wish to have the ability to detect,
- statistical significance equals the false constructive fee (chance of rejecting the null speculation when there was no impact). Essentially the most generally used significance is 5%. Nonetheless, you may select different values relying in your false-positive tolerance. For instance, if implementing the change is dear, you may need to use a decrease significance threshold.
- statistical energy exhibits the chance of rejecting the null speculation on condition that we truly had an impact equal to or larger than the impact measurement. Individuals typically use an 80% threshold, however in some circumstances (i.e. you need to be extra assured that there are not any detrimental results), you may use 90% and even 99%.
We want all these values to estimate the variety of purchasers within the experiment. Let’s attempt to outline them in our case to grasp their which means higher.
We are going to begin with impact measurement:
- we count on the retention fee to alter by at the least 3% factors because of our marketing campaign,
- we want to spot adjustments in clients’ spending by 20 or extra EUR.
For statistical significance, I’ll use the default 5% threshold (so if we see the impact because of A/B take a look at evaluation, we could be assured with 95% that the impact is current). Let’s goal a 90% statistical energy threshold in order that if there’s an precise impact equal to or greater than the impact measurement, we’ll spot this alteration in 90% of circumstances.
Let’s begin with statistical formulation that may permit us to get estimations shortly. Statistical formulation indicate that our variable has a selected distribution, however they will normally allow you to estimate the magnitude of the variety of samples. Later, we’ll use bootstrap to get extra correct outcomes.
For retention, we are able to use the usual take a look at of proportions. We have to know the precise worth to estimate the normed impact measurement. We will get it from the historic information earlier than the experiment.
import statsmodels.stats.energy as stat_power
import statsmodels.stats.proportion as stat_propbase_retention = before_df.retention.imply()
ret_effect_size = stat_prop.proportion_effectsize(base_retention + 0.03,
base_retention)
sample_size = 2*stat_power.tt_ind_solve_power(
effect_size = ret_effect_size,
alpha = 0.05, energy = 0.9,
nobs1 = None, # we specified nobs1 as None to get an estimation for it
different='bigger'
)
# ret_effect_size = 0.0632, sample_size = 8573.86
We used a one-sided take a look at as a result of there is not any distinction in whether or not there is a detrimental or no impact from the enterprise perspective since we can’t implement this alteration. Utilizing a one-sided as an alternative of a two-sided take a look at will increase the statistical energy.
We will equally estimate the pattern measurement for the client worth, assuming the conventional distribution. Nonetheless, the distribution isn’t regular truly, so we must always count on extra exact outcomes from bootstrap.
Let’s write code.
val_effect_size = 20/before_df.customer_value.std()sample_size = 2*stat_power.tt_ind_solve_power(
effect_size = val_effect_size,
alpha = 0.05, energy = 0.9,
nobs1 = None,
different='bigger'
)
# val_effect_size = 0.0527, sample_size = 12324.13
We obtained estimations for the wanted pattern sizes for every take a look at. Nonetheless, there are circumstances when you will have a restricted variety of purchasers and need to perceive the statistical energy you may get.
Suppose now we have solely 5K clients (2.5K in every group). Then, we can obtain 72.2% statistical energy for retention evaluation and 58.7% — for buyer worth (given the specified statistical significance and impact sizes).
The one distinction within the code is that this time, we have specified nobs1 = 2500
and left energy
as None
.
stat_power.tt_ind_solve_power(
effect_size = ret_effect_size,
alpha = 0.05, energy = None,
nobs1 = 2500,
different='bigger'
)
# 0.7223stat_power.tt_ind_solve_power(
effect_size = val_effect_size,
alpha = 0.05, energy = None,
nobs1 = 2500,
different='bigger'
)
# 0.5867
Now, it is time to use bootstrap for the ability evaluation, and we’ll begin with the client worth take a look at because it’s simpler to implement.
Let’s focus on the essential thought and steps of energy evaluation utilizing bootstrap. First, we have to outline our purpose clearly. We need to estimate the statistical energy relying on the pattern measurement. If we put it in additional sensible phrases, we need to know the proportion of circumstances when there was a rise in buyer spending by 20 or extra EUR, and we have been in a position to reject the null speculation and implement this alteration in manufacturing. So, we have to simulate a bunch of such experiments and calculate the share of circumstances once we can see statistically important adjustments in our metric.
Let’s take a look at one experiment and break it into steps. Step one is to generate the experimental information. For that, we have to get a random subset from the inhabitants equal to the pattern measurement, randomly cut up these clients into management and experiment teams and add an impact equal to the impact measurement for the therapy group. All this logic is carried out in get_sample_for_value
operate beneath.
def get_sample_for_value(pop_df, sample_size, effect_size):
# getting pattern of wanted measurement
sample_df = pop_df.pattern(sample_size)# randomly assign therapy
sample_df['treatment'] = sample_df.index.map(
lambda x: 1 if np.random.uniform() > 0.5 else 0)
# add efffect for the therapy group
sample_df['predicted_value'] = sample_df['customer_value']
+ effect_size * sample_df.therapy
return sample_df
Now, we are able to deal with this artificial experiment information as we normally do with A/B take a look at evaluation, run a bunch of bootstrap simulations, estimate results, after which get a confidence interval for this impact.
We might be utilizing linear regression to estimate the impact of therapy. As mentioned in the previous article, it is value including to linear regression options that designate the end result variable (clients’ spending). We are going to add the variety of members of the family and common wage to the regression since they’re positively correlated.
import statsmodels.formulation.api as smf
val_model = smf.ols('customer_value ~ num_family_members + country_avg_annual_earning',
information = before_df).match(disp = 0)
val_model.abstract().tables[1]
We are going to put all of the logic of doing a number of bootstrap simulations and estimating therapy results into the get_ci_for_value
operate.
def get_ci_for_value(df, boot_iters, confidence_level):
tmp_data = []for iter in vary(boot_iters):
sample_df = df.pattern(df.form[0], change = True)
val_model = smf.ols('predicted_value ~ therapy + num_family_members + country_avg_annual_earning',
information = sample_df).match(disp = 0)
tmp_data.append(
{
'iteration': iter,
'coef': val_model.params['treatment']
}
)
coef_df = pd.DataFrame(tmp_data)
return coef_df.coef.quantile((1 - confidence_level)/2),
coef_df.coef.quantile(1 - (1 - confidence_level)/2)
The subsequent step is to place this logic collectively, run a bunch of such artificial experiments, and save outcomes.
def run_simulations_for_value(pop_df, sample_size, effect_size,
boot_iters, confidence_level, num_simulations):tmp_data = []
for sim in tqdm.tqdm(vary(num_simulations)):
sample_df = get_sample_for_value(pop_df, sample_size, effect_size)
num_users_treatment = sample_df[sample_df.treatment == 1].form[0]
value_treatment = sample_df[sample_df.treatment == 1].predicted_value.imply()
num_users_control = sample_df[sample_df.treatment == 0].form[0]
value_control = sample_df[sample_df.treatment == 0].predicted_value.imply()
ci_lower, ci_upper = get_ci_for_value(sample_df, boot_iters, confidence_level)
tmp_data.append(
{
'experiment_id': sim,
'num_users_treatment': num_users_treatment,
'value_treatment': value_treatment,
'num_users_control': num_users_control,
'value_control': value_control,
'sample_size': sample_size,
'effect_size': effect_size,
'boot_iters': boot_iters,
'confidence_level': confidence_level,
'ci_lower': ci_lower,
'ci_upper': ci_upper
}
)
return pd.DataFrame(tmp_data)
Let’s run this simulation for sample_size = 100
and see the outcomes.
val_sim_df = run_simulations_for_value(before_df, sample_size = 100,
effect_size = 20, boot_iters = 1000, confidence_level = 0.95,
num_simulations = 20)
val_sim_df.set_index('simulation')[['sample_size', 'ci_lower', 'ci_upper']].head()
We have the next information for 20 simulated experiments. We all know the boldness interval for every experiment, and now we are able to estimate the ability.
We’d have rejected the null speculation if the decrease certain of the boldness interval was above zero, so let’s calculate the share of such experiments.
val_sim_df['successful_experiment'] = val_sim_df.ci_lower.map(
lambda x: 1 if x > 0 else 0)val_sim_df.groupby(['sample_size', 'effect_size']).mixture(
{
'successful_experiment': 'imply',
'experiment_id': 'depend'
}
)
We have began with simply 20 simulated experiments and 1000 bootstrap simulations to estimate their confidence interval. Such a number of simulations will help us get a low-resolution image fairly shortly. Protecting in thoughts the estimation we obtained from the basic statistics, we must always count on that numbers round 10K will give us the specified statistical energy.
tmp_dfs = []
for sample_size in [100, 250, 500, 1000, 2500, 5000, 10000, 25000]:
print('Simulation for pattern measurement = %d' % sample_size)
tmp_dfs.append(
run_simulations_for_value(before_df, sample_size = sample_size, effect_size = 20,
boot_iters = 1000, confidence_level = 0.95, num_simulations = 20)
)val_lowres_sim_df = pd.concat(tmp_dfs)
We obtained outcomes just like these of our theoretical estimations. Let’s attempt to run estimations with extra simulated experiments (100 and 500 experiments). We will see that 12.5K purchasers might be sufficient to realize 90% statistical energy.
I’ve added all the ability evaluation outcomes to the chart in order that we are able to see the relation clearly.
In that case, you may already see that bootstrap can take a major period of time. For instance, precisely estimating energy with 500 experiment simulations for simply 3 pattern sizes took me nearly 2 hours.
Now, we are able to estimate the connection between impact measurement and energy for a 12.5K pattern measurement.
tmp_dfs = []
for effect_size in [1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100]:
print('Simulation for impact measurement = %d' % effect_size)
tmp_dfs.append(
run_simulations_for_value(before_df, sample_size = 12500, effect_size = effect_size,
boot_iters = 1000, confidence_level = 0.95, num_simulations = 100)
)val_effect_size_sim_df = pd.concat(tmp_dfs)
We will see that if the precise impact on clients’ spending is larger than 20 EUR, we’ll get even larger statistical energy, and we can reject the null speculation in additional than 90% of circumstances. However we can spot the ten EUR impact in lower than 50% of circumstances.
Let’s transfer on and conduct energy evaluation for retention as effectively. The whole code is structured equally to the client spending evaluation. We are going to focus on nuances intimately beneath.
import tqdmdef get_sample_for_retention(pop_df, sample_size, effect_size):
base_ret_model = smf.logit('retention ~ num_family_members', information = pop_df).match(disp = 0)
tmp_pop_df = pop_df.copy()
tmp_pop_df['predicted_retention_proba'] = base_ret_model.predict()
sample_df = tmp_pop_df.pattern(sample_size)
sample_df['treatment'] = sample_df.index.map(lambda x: 1 if np.random.uniform() > 0.5 else 0)
sample_df['predicted_retention_proba'] = sample_df['predicted_retention_proba'] + effect_size * sample_df.therapy
sample_df['retention'] = sample_df.predicted_retention_proba.map(lambda x: 1 if x >= np.random.uniform() else 0)
return sample_df
def get_ci_for_retention(df, boot_iters, confidence_level):
tmp_data = []
for iter in vary(boot_iters):
sample_df = df.pattern(df.form[0], change = True)
ret_model = smf.logit('retention ~ therapy + num_family_members', information = sample_df).match(disp = 0)
tmp_data.append(
{
'iteration': iter,
'coef': ret_model.params['treatment']
}
)
coef_df = pd.DataFrame(tmp_data)
return coef_df.coef.quantile((1 - confidence_level)/2), coef_df.coef.quantile(1 - (1 - confidence_level)/2)
def run_simulations_for_retention(pop_df, sample_size, effect_size,
boot_iters, confidence_level, num_simulations):
tmp_data = []
for sim in tqdm.tqdm(vary(num_simulations)):
sample_df = get_sample_for_retention(pop_df, sample_size, effect_size)
num_users_treatment = sample_df[sample_df.treatment == 1].form[0]
retention_treatment = sample_df[sample_df.treatment == 1].retention.imply()
num_users_control = sample_df[sample_df.treatment == 0].form[0]
retention_control = sample_df[sample_df.treatment == 0].retention.imply()
ci_lower, ci_upper = get_ci_for_retention(sample_df, boot_iters, confidence_level)
tmp_data.append(
{
'experiment_id': sim,
'num_users_treatment': num_users_treatment,
'retention_treatment': retention_treatment,
'num_users_control': num_users_control,
'retention_control': retention_control,
'sample_size': sample_size,
'effect_size': effect_size,
'boot_iters': boot_iters,
'confidence_level': confidence_level,
'ci_lower': ci_lower,
'ci_upper': ci_upper
}
)
return pd.DataFrame(tmp_data)
First, since now we have a binary consequence for retention (whether or not the client returns subsequent month or not), we’ll use a logistic regression mannequin as an alternative of linear regression. We will see that retention is correlated with the dimensions of the household. It may be the case that once you purchase many several types of merchandise for members of the family, it is tougher to search out one other service that may cowl all of your wants.
base_ret_model = smf.logit('retention ~ num_family_members', information = before_df).match(disp = 0)
base_ret_model.abstract().tables[1]
Additionally, the operateget_sample_for_retention
has a bit trickier logic to regulate outcomes for the therapy group. Let’s take a look at it step-by-step.
First, we’re becoming a logistic regression on the entire inhabitants information and utilizing this mannequin to foretell the chance of retaining utilizing this mannequin.
base_ret_model = smf.logit('retention ~ num_family_members', information = pop_df)
.match(disp = 0)
tmp_pop_df = pop_df.copy()
tmp_pop_df['predicted_retention_proba'] = base_ret_model.predict()
Then, we obtained a random pattern equal to the dimensions and cut up it right into a management and take a look at group.
sample_df = tmp_pop_df.pattern(sample_size)
sample_df['treatment'] = sample_df.index.map(
lambda x: 1 if np.random.uniform() > 0.5 else 0)
For the therapy group, we enhance the chance of retaining by the anticipated impact measurement.
sample_df['predicted_retention_proba'] = sample_df['predicted_retention_proba']
+ effect_size * sample_df.therapy
The final step is to outline, based mostly on chance, whether or not the client is retained or not. We used uniform distribution (random quantity between 0 and 1) for that:
- if a random worth from a uniform distribution is beneath chance, then a buyer is retained (it occurs with specified chance),
- in any other case, the client has churned.
sample_df['retention'] = sample_df.predicted_retention_proba.map(
lambda x: 1 if x > np.random.uniform() else 0)
You’ll be able to run a number of simulations to make sure our sampling operate works as supposed. For instance, with this name, we are able to see that for the management group, retention is the same as 64% like within the inhabitants, and it is 93.7% for the experiment group (as anticipated with effect_size = 0.3
)
get_sample_for_retention(before_df, 10000, 0.3)
.groupby('therapy', as_index = False).retention.imply()# | | therapy | retention |
# |---:|------------:|------------:|
# | 0 | 0 | 0.640057 |
# | 1 | 1 | 0.937648 |
Now, we are able to additionally run simulations to see the optimum variety of samples to achieve 90% of statistical energy for retention. We will see that the 12.5K pattern measurement additionally might be adequate for retention.
Analysing outcomes
We will use linear or logistic regression to analyse outcomes or leverage the capabilities we have already got for bootstrap CI.
value_model = smf.ols(
'customer_value ~ therapy + num_family_members + country_avg_annual_earning',
information = experiment_df).match(disp = 0)
value_model.abstract().tables[1]
So, we obtained the statistically important end result for the client spending equal to 25.84 EUR with a 95% confidence interval equal to (16.82, 34.87)
.
With the bootstrap operate, the CI might be fairly shut.
get_ci_for_value(experiment_df.rename(
columns = {'customer_value': 'predicted_value'}), 1000, 0.95)
# (16.28, 34.63)
Equally, we are able to use logistic regression for retention evaluation.
retention_model = smf.logit('retention ~ therapy + num_family_members',
information = experiment_df).match(disp = 0)
retention_model.abstract().tables[1]
Once more, the bootstrap method provides shut estimations for CI.
get_ci_for_retention(experiment_df, 1000, 0.95)
# (0.072, 0.187)
With logistic regression, it may be difficult to interpret the coefficient. Nonetheless, we are able to use a hacky method: for every buyer in our dataset, calculate chance in case the client was in management and therapy utilizing our mannequin after which have a look at the typical distinction between possibilities.
experiment_df['treatment_eq_1'] = 1
experiment_df['treatment_eq_0'] = 0experiment_df['retention_proba_treatment'] = retention_model.predict(
experiment_df[['retention', 'treatment_eq_1', 'num_family_members']]
.rename(columns = {'treatment_eq_1': 'therapy'}))
experiment_df['retention_proba_control'] = retention_model.predict(
experiment_df[['retention', 'treatment_eq_0', 'num_family_members']]
.rename(columns = {'treatment_eq_0': 'therapy'}))
experiment_df['proba_diff'] = experiment_df.retention_proba_treatment
- experiment_df.retention_proba_control
experiment_df.proba_diff.imply()
# 0.0281
So, we are able to estimate the impact on retention to be 2.8%.
Congratulations! We’ve lastly completed the total A/B take a look at evaluation and have been in a position to estimate the impact each on common buyer spending and retention. Our experiment is profitable, so in actual life, we might begin desirous about rolling it to manufacturing.
You could find the total code for this instance on GitHub.
Let me shortly recap what we’ve mentioned at present:
- The principle thought of bootstrap is simulations with replacements out of your pattern, assuming that the final inhabitants has the identical distribution as the information now we have.
- Bootstrap shines in circumstances when you will have few information factors, your information has outliers or is way from any theoretical distribution. Bootstrap may also allow you to estimate customized metrics.
- You need to use bootstrap to work with observational information, for instance, to get confidence intervals in your values.
- Additionally, bootstrap is broadly used for A/B testing evaluation — each to estimate the impression of therapy and do an influence evaluation to design an experiment.
Thank you a large number for studying this text. In case you have any follow-up questions or feedback, please go away them within the feedback part.
All the pictures are produced by the creator except in any other case said.
This text was impressed by the guide “Behavioral Data Analysis with R and Python” by Florent Buisson.