Aide-mémoire Stata

Aide-mémoire Les aide-mémoire Stata fournissent à tout utilisateur, qu'il soit nouveau ou ancien, un guide bien structuré et utile sur certaines des bases de Stata. Produite par les praticiens des données Dr Tim Essam et Dr Laura Hughes, la feuille de triche couvre des sujets allant de l'analyse des données au traçage dans Stata.

Téléchargez la feuille de triche Stata et voyez par vous-même.

Licence d’étudiant Stata

Êtes-vous actuellement inscrit en enseignement supérieur et recherchez-vous une licence Stata à un prix réduit ?

Que vous cherchiez à acheter Stata pour toute l'année ou que vous ayez besoin d'une licence pour le dernier semestre, nous avons ce qu'il vous faut. Tout ce dont vous avez besoin est une preuve du statut d'étudiant (une copie de votre carte d'identité universitaire).

En savoir plus sur les options de licence étudiant pour Stata.

Les licences annuelles et pluriannuelles de Stata offrent aux utilisateurs un coût initial inférieur et garantissent que vous restez sur la version actuelle de Stata (18).

Stata 18 offres aux utilisateurs de nombreuses nouvelles fonctionnalités, notamment des tableaux personnalisables, l'économétrie bayésienne, PyStata et bien plus encore. Cliquez ici pour découvrir les nouvelles fonctionnalités de Stata 18.

Qualiopi

Timberlake Training s'efforce constamment d'offrir la meilleure qualité de cours et d'enseignement possible à ses clients. Nous sommes donc fiers d'annoncer notre partenariat avec Equancy, qui nous permettra d'offrir des cours délivrés par un organisme certifié Qualiopi. 

Si vous souhaitez bénéficier du processus Qualiopi (uniquement disponible en France), permettant la prise en charge des coûts de formation par votre OPCO, veuillez contacter notre partenaire Equancy ici.

Tables personnalisables dans Stata avec Chuck Huber

graph1

Apprenez à implémenter des tableaux personnalisables dans Stata avec Chuck Huber (directeur de la diffusion statistique StataCorp). Tout au long de ce guide utile, Chuck développe les fonctionnalités de commande de tableau.

La commande table est flexible pour créer des tableaux de nombreux types : tabulations, tableaux de statistiques récapitulatives, tableaux de résultats de régression, etc. Table peut calculer des statistiques récapitulatives à afficher dans la table. Le tableau peut également inclure les résultats d'autres commandes Stata.

Ce guide se compose de sept parties, de l'introduction de la commande à l'utilisation de styles et d'étiquettes personnalisés :

1. Tables personnalisables dans Stata, partie 1 : La nouvelle commande de table

2. Tables personnalisables dans Stata, partie 2 : La nouvelle commande collect

3. Tables personnalisables dans Stata, partie 3 : La table classique 1

4. Tableaux personnalisables dans Stata, partie 4 : Tableau des tests statistiques

5. Tableaux personnalisables dans Stata, partie 5 : Tableaux pour un modèle de régression

6. Tableaux personnalisables dans Stata, partie 6 : Tableaux pour modèles de régression multiple

7. Tableaux personnalisables dans Stata, partie 7 : Enregistrement et utilisation de styles et d'étiquettes personnalisés

Timberlake sont les distributeurs Stata pour la France

A partir du lundi 27 Janvier 2020, Timberlake est le distributeur official de Stata pour la France.

Nous avons étendu nos services pour désormais prendre en charge les utilisateurs de Stata dans toute la France.

Nous distribuons Stata depuis 1985 et nous assurons actuellement la vente, le support et la formation de logiciels Stata au Royaume-Uni, en Irlande, en Espagne, au Portugal, au Moyen-Orient, en Afrique du Nord , au Brésil et en Pologne.

Nous offrons un service et un support rapides à tous nos clients et sommes heureux d’offrir désormais des services aux utilisateurs de Stata en France.

Nous contacter:

Timberlake S.A.S
48 Rue de Chateau d'eau
75010 PARIS

Webinaires Stata GRATUITS à venir

Nous organisons régulièrement des webinaires Stata gratuits destinés aux nouveaux utilisateurs de Stata ou à ceux qui envisagent d’acheter Stata pour leurs recherches, jusqu’à des sessions de niveau intermédiaire.

Le programme complet des webinaires Stata gratuits que nous proposons est répertorié ici.

Stata Tips #21 – Stata survival analysis commands with interval-censored event times

Stata's survival analysis commands with interval-censored event times

What is it for?

Often, time-to-event or survival data are gathered at particular observation times. A physician will detect the recurrence of cancer only when there is a follow-up appointment, and a biologist might know that a study animal in the wild has died when they visit the site, but not exactly when it happened. In either case, all we know is that the even happened between the two appointments or visits, which is to say that there is a lower limit and an upper limit to the survival time. In statistical parlance, data like this are interval-censored.

Stata version 15 includes a new command, stintreg, which provides you with the familiar streg parametric survival regressions, while allowing for interval-censored data. Just by typing estat sbcusum, you obtain test statistics, critical values at 1, 5 and 10 percent, and a cumulative sum (CUSUM) plot, which shows when, and in what way, the assumption is broken if it is.

An example with medical follow-up data

In this post, we'll apply stintreg to a dataset from David Collett's textbook, "Modelling Survival Data in Medical Research" (3rd edition) (Example 9.4). This is from a real-life study that examined breast retraction, a side-effect of breast cancer treatment, and compared women treated with radiotherapy alone and those treated with radiotherapy and chemotherapy. You can download the data in CSV form here and follow along with this do-file.

There are three variables: treatment group, start time of the interval (the last appointment at which there was no sign of retraction) and end time of the interval (the first appointment where there was retraction). First, let's visualise the dataset. Time is on the vertical axis, and the women are ranked by start time from left to right, in the two treatment groups. The exact time when retraction appeared is somewhere in the grey lines. Ideally, we would see a lower risk of retraction as grey lines stretching off to the top of the chart without red dots, and high risk indicated by red dots low down in the chart. But, it is hard to judge whether there are any differences.

Stata graph

Let's fit some streg models, using naive approximations of the data so that they look like they are exactly known. First, we use the start time (the last retraction-free appointment).

gen naive = (end!=.)

stset start, failure(naive)

streg treatment, distribution(weibull)

Stata output

This shows a very strong and statistically significant detrimental effect of chemotherapy. Next, we use the mid-point between the two appointments:

gen naivetime = (start+end)/2

stset naivetime, failure(naive)

streg treatment, distribution(weibull)

Stata output

Oh dear! This shows an almost non-existent difference between the two groups, with no statistical significance. Which can we trust? We need to account for the interval censoring ina principled way, as part of the overall probabilistic model for the data, and this is where stintreg comes in.

stintreg treatment, interval(start end) distribution(weibull)

Stata output

Now, we can see that the treatment group difference is quite large, and quite compelling in terms of significance. Note that, because the models are very different, we can't compare the log-likelihoods.

Why does this matter?

Interval-censored data are everywhere. Being able to account for the uncertainty in the data gives more honest answers, not only by avoiding bias but by having the right level of uncertainty. If you fill in the "blank" with a single number, then you give Stata the false impression that the data are exactly known, and the inferences (p-values, confidence intervals) that follow will be falsely precise as a result.

Stata Tips #20 – Power Analysis

The ultimate question in applied research is: does event A cause event B? The search for causality relationships have pushed applied scientists to rely more and more on field experiments. To be able to detect a treatment effect (or the causality impact), and if setting up an experiment, researchers need to determine the required sample size. Stata has a command power that allows users tremendous flexibility in determining sample size, power of test, and graph those. In the following example we compute sample size for a specific power of test; we could have also computed different powers for a specific sample size.

Assume that we want to calculate the impact of an innovative teaching program on General Equivalency Development (GED) completion rates for a young population (between the age of 17 and 25). In order to do so, we need to run a randomized control trial and randomly pick participants in the innovative program. We know from the literature that the proportion of the population in question with a GED is around 66%. Experts in the education field expects the new program to increase GED completion rates by 20 percentage points (to 86%). How many study participants do we need to test this hypothesis?

Using the command below Stata helps identify the total number of participants (and also the number in each group: treatment and control) needed in the study.

power twoproportions 0.66 0.86, power(0.8) alpha(0.05)

The twoproportions method is used because we are comparing two proportions, 0.66 is the proportion of the population with GED and 0.86 is what we expect the proportion of the population with GED will be after completing the program. power() is the power of the test and alpha() is the significance level of the test; both values here are the default values.

Performing iteration ...

Estimated sample sizes for a two-sample proportions test
Pearson's chi-squared test
Ho: p2 = p1 versus Ha: p2 != p1

Study parameters:

alpha = 0.0500
power = 0.8000
delta = 0.2000 (difference)
p1 = 0.6600
p2 = 0.8600

Estimated sample sizes:

N = 142
N per group = 71

The result of the power analysis tells us that we need to recruit 142 participants into the study in which we enroll 71 in the new program and rely on the remaining 71 as a control group.

Let us say we now want to relax the 0.8 assumption of the power of the test and allow 4 different values ranging from a low of 0.6 to a high of 0.9

power twoproportions 0.66 0.86, power(0.6 0.7 0.8 0.9) alpha(0.05)

Performing iteration ...

Estimated sample sizes for a two-sample proportions test
Pearson's chi-squared test
Ho: p2 = p1 versus Ha: p2 != p1

+-----------------------------------------------------------------+
| alpha power N N1 N2 delta p1 p2 |
|-----------------------------------------------------------------|
| .05 .6 90 45 45 .2 .66 .86 |
| .05 .7 112 56 56 .2 .66 .86 |
| .05 .8 142 71 71 .2 .66 .86 |
| .05 .9 188 94 94 .2 .66 .86 |
+-----------------------------------------------------------------+

The command gives us the sample size of the 4 different scenarios with a low of 90 participants to a high of 188. As expected, the higher the power of the test the larger the sample size that is required.

We can graph the above table using the below command:

power twoproportions 0.66 0.86, power(0.6 0.7 0.8 0.9) alpha(0.05) graph

[Image: Stata output for a multilevel linear regression]

In some cases, the sample size has been predetermined. For a variety of reasons which could eligibility into a program, financial, etc. the number of participants in a study is known and also fixed. In the above example assume the number of participants eligible for the innovative education program is 120. How much power can we detect from the given sample size and a number of different magnitude effects of the program? As a reminder, power of a test is the probability of making a correct decision (in order words to reject the null hypothesis) when the null hypothesis is actually false.

power twoproportions 0.66 (0.71 0.76 0.81 0.86), n(120) alpha(0.05) graph

In the above command, we fix sample size to 120 and suggest 4 different effect sizes each increasing by 5 percentage points from the initial 66% proportion of GED completion rates.

[Image: Stata output for a multilevel linear regression]

The resulting graph suggests low power with the highest being 73% for an effect of 20 percentage points increase in the proportion of GED completion rate. This is not surprising given that the first command gave us a required 142 participants to get the same effect with a power of 80%. With less number of participants, we expect lower power.