- Embed and execute Python code
- Use Python interactively
- Define and use Python routines in do-files
- Define and use Python routines in ado-files
- Interact with Stata through Stata Function Interface (sfi)
<<dd_do>>
python:
print('Hello World!')
end
<</dd_do>>
Just as when using Python directly, indentation is important when entering Python code in Stata.
<<dd_do>>
python:
sum = 0
for i in range(7):
sum = sum + i
print(sum)
end
<</dd_do>>
<<dd_do>>
python:
from functools import reduce
from sfi import Data, Macro
stata: quietly sysuse auto, clear
sum = reduce((lambda x, y: x + y), Data.get(var='price'))
Macro.setLocal('sum', str(sum))
end
display "sum of var price is : `sum'"
<</dd_do>>
<<dd_do>>
python:
sum1 = reduce((lambda x, y: x + y), Data.get(var='rep78'))
sum1
sum2 = reduce((lambda x, y: x + y), Data.get(var='rep78', selectvar=-1))
sum2
end
<</dd_do>>
- Pandas
- Numpy
- BeautifulSoup, lxml
- Matplotlib
- Scikit-Learn, Tensorflow, Keras
- NLTK
<<dd_do>>
python:
import numpy as np
from sfi import Platform
import matplotlib
if Platform.isWindows():
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
from sfi import Data
end
<</dd_do>>
We use the Stata sandstone example dataset.
<<dd_do>>
use https://www.stata-press.com/data/r16/sandstone, clear
* Use sfi to get data from Stata
python:
D = np.array(Data.get("northing easting depth"))
end
<</dd_do>>
We use a triangular mesh to produce the graph.
python:
ax = plt.axes(projection='3d')
plt.xticks(np.arange(60000, 90001, step=10000))
plt.yticks(np.arange(30000, 50001, step=5000))
ax.plot_trisurf(D[:,0], D[:,1], D[:,2], cmap='viridis', edgecolor='none')
plt.savefig("sandstone.png")
end
<<dd_do:quietly>> python: ax = plt.axes(projection='3d') plt.xticks(np.arange(60000, 90001, step=10000)) plt.yticks(np.arange(30000, 50001, step=5000)) ax.plot_trisurf(D[:,0], D[:,1], D[:,2], cmap='viridis', edgecolor='none') plt.savefig("sandstone.png") end <</dd_do>>
python:
ax.plot_trisurf(D[:,0], D[:,1], D[:,2],
cmap=plt.cm.Spectral, edgecolor='none')
ax.view_init(30, 60)
plt.savefig("sandstone1.png")
end
<<dd_do:quietly>> python: ax.plot_trisurf(D[:,0], D[:,1], D[:,2], cmap=plt.cm.Spectral, edgecolor='none') ax.view_init(30, 60) plt.savefig("sandstone1.png") end <</dd_do>>
Animation (do-file)
We scrape Nasdaq 100 stock tickers with pandas.
python:
import pandas as pd
data = pd.read_html("https://en.wikipedia.org/wiki/NASDAQ-100")
df = data[2]
df = df.drop(df.index[0])
t = df.values.tolist()
end
python:
from sfi import Data
Data.addObs(len(t))
stata: gen company = ""
stata: gen ticker = ""
Data.store(None, range(len(t)), t)
end
<<dd_do: quietly>> use stata/nas100ticker.dta, clear <</dd_do>>
<<dd_do>>
list in 1/5, clean
<</dd_do>>
We use a Python script to get detailed information of a Nasdaq 100 stock, for example ATVI.
use nas100ticker, clear
quietly describe
frame create detail
forvalues i = 1/`r(N)' {
local a = ticker[`i']
local b detail
python script nas1detail.py, args(`a' `b')
sleep 100
}
frame detail : save nasd100detail.dta, replace
<<dd_do: quietly>> use stata/nasd100detail.dta, clear <</dd_do>>
<<dd_do>>
list ticker open_price open_date close_price close_date in 1/5, clean
<</dd_do>>
pysvm (ado file)
program pysvm
version 16
syntax varlist, predict(name)
gettoken label feature : varlist
python: dosvm("`label'", "`feature'", "`predict'")
end
Python routine in ado file
python:
from sfi import Data
import numpy as np
from sklearn.svm import SVC
def dosvm(label, features, predict):
X = np.array(Data.get(features))
y = np.array(Data.get(label))
svc_clf = SVC(gamma='auto')
svc_clf.fit(X, y)
y_pred = svc_clf.predict(X)
Data.addVarByte(predict)
Data.store(predict, None, y_pred)
end
<<dd_do: quietly>> adopath + ./stata <</dd_do>>
<<dd_do>>
sysuse auto
pysvm foreign mpg price, predict(for2)
<</dd_do>>
<<dd_do>>
label values for2 origin
tabulate foreign for2, nokey
<</dd_do>>
<<dd_do: quietly>> sysuse auto, clear set seed 12345 <</dd_do>>
<<dd_do>>
pysvm2 foreign mpg price if runiform() <= 0.2
pysvm2predict for2
<</dd_do>>
<<dd_do>>
label values for2 origin
tabulate foreign for2, nokey
<</dd_do>>
Train program (pysvm2.ado)
program pysvm2
version 16
syntax varlist(min=3) [if] [in]
gettoken label features : varlist
marksample touse
qui count if `touse'
if r(N) == 0 {
di as error "no observations"
exit 2000
}
qui summarize `label' if `touse'
if r(min) >= r(max) {
di as error "outcome does not vary"
exit 2000
}
quietly python: dosvm2("`label'", "`features'", "`touse'")
di as text "note: training finished successfully"
end
Python routine in pysvm2.ado
python:
import sys
from sfi import Data, Macro
import numpy as np
from sklearn.svm import SVC
import __main__
def dosvm2(label, features, select):
X = np.array(Data.get(features, selectvar=select))
y = np.array(Data.get(label, selectvar=select))
svc_clf = SVC(gamma='auto')
svc_clf.fit(X, y)
__main__.svc_clf = svc_clf
Macro.setGlobal('e(svm_features)', features)
return svc_clf
end
Predict program (pysvm2predict.ado)
program pysvm2predict
version 16
syntax anything [if] [in]
gettoken newvar rest : anything
if "`rest'" != "" {
exit 198
}
confirm new variable `newvar'
marksample touse
qui count if `touse'
if r(N) == 0 {
di as text "zero observations"
exit 2000
}
qui replace `touse' = _n if `touse' != 0
python: dosvmpredict2("`newvar'", "`touse'")
end
Python routine in pysvm2predict.ado
python:
import sys
from sfi import Data, Macro
import numpy as np
from sklearn.svm import SVC
from __main__ import svc_clf
def dosvmpredict2(predict, select):
features = select + " "+ Macro.getGlobal('e(svm_features)')
X = np.array(Data.get(features, selectvar=select))
y_pred = svc_clf.predict(X[:,1:])
y1 = np.reshape(y_pred, (-1,1))
y = np.concatenate((X, y1), axis=1)
Data.addVarDouble(predict)
dim = y.shape[0]
j = y.shape[1]-1
for i in range(dim):
Data.storeAt(predict, y[i,0]-1, y[i,j])
end
In pysvm2.ado ado code:
...
import __main__
...
__main__.svc_clf = svc_clf
...
To access svc_clf in Python routines in ado files:
...
from __main__ import svc_clf
...
- sfi details and examples
- Stata Python documentation
- Stata Python integration
- The talk is made with Stata markdown and dynpandoc
- wordcloud do-file



