Real Estate Consulting Data Project (2024)

Real Estate Consulting Data Project (3)

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Recently, I took on the task of advising an imaginary real estate investment firm on the top 5 zip codes to invest in according to insights attained from a dataset sourced from Zillow containing the average home sale price for each of 14,723 zip codes on a monthly frequency over a time window beginning on April 1996 and ending on April 2018. The analysis and modeling for the project were done using a combination of Python and R Jupyter notebooks. The data and project notebooks are available here in the project repository.

Since the data used for the project end in April 2018, it was appropriate to imagine that this was the time at which the consultation was provided, since recent developments in the world are having widespread effects on financial and real estate markets, and there is no way to relate the project data to current events. It suffices to say that it is important to keep in mind that macroeconomic and sociopolitical issues can cause shocks in the market which are unforeseen, and any attempts at forecasting should take into account volatility clustering caused by such shocks. Fortunately, the data span across a time frame which includes the 2007–2010 financial and housing market collapse, providing an opportunity to observe how well individual zip codes weathered the last storm, allowing some insight into those which could be expected to show resilience to adverse economic and market conditions in the future. A number of zip codes did not have complete data over the time period of the database, making it impossible to make proper comparisons of their behavior against those which did, and were therefore left out of consideration.

The investment recommendations were based on two main criteria: high expected capital appreciation rate with low expected variance, and the profitability of leasing the properties within the zip code over the investment horizon. To project capital appreciation rates for each zip code, a method was devised to quickly generate an appropriate SARIMAX model for each zip code so that comparisons could be made of the forecasted mean and volatility of returns over the investment horizon. To compare profitability of leasing properties within each zip code, a database was obtained from a 2017 Zillow research article containing P2R (price-to-rent) ratios for various metropolitan areas. These ratios could be compared to the national average, and give an idea of how many years of rent payments would pay off the value of a property in a given area.

Coding blocks will follow the workflow presented in this article for demonstration. For a more in-depth investigation into the methods employed, please see the Jupyter notebooks in the repository. This article will cover an efficient method to generate well-fit SARIMAX models to time series data within a Python workflow which is fast enough to allow for each zip code to be individually modeled, so that these models can be compared to find the zip codes with the best outlooks. The method involves mixing the strengths of time series modeling packages from both Python and R (statsmodels and forecast, respectively), by calling the speedy model generating functionality of the auto.arima function in R’s forecast package from within a Python workflow using the rpy2 package in order to generate optimal orders and coefficients for a SARIMAX model, then smoothing a statsmodels SARIMAX model with the values obtained. This was found to generate consistently satisfactory models in a way that avoids the time consuming process of grid searching for optimal model orders using loops, and allows for the benefits of all of the statistical explicity and functionality of statsmodels, as well as working in a Python environment.

Please note that the contents of this article are intended only as a demonstration of a data science workflow and methods related to analyzing and modeling time-series data in the context of real estate, and are not to be taken as any form of investment advice. Unless otherwise noted, all images are the property of the author.

Loading in the data and initial observations

First, we need to import all the libraries to use; we will be using many standard Python data science packages, especially pandas and statsmodels.

Now using pandas to read the data:

Real Estate Consulting Data Project (4)

Looking at the raw data, we can see that there are two columns with 5 digit numbers to identify the “regions.” A few quick Google searches reveals that the ‘RegionName’ column contains the zip codes. The other column is not useful to us. We can also observe that the data are in what is called “wide format,” where the dates and their corresponding values for each zip code are contained in columns which expand the dataframe to the right, which is why it has 273 columns. To convert the data to long format, pandas’ melt method can be used. For our purposes, we will want the data in long format. Testing found that the sizeRank feature was not useful.

Real Estate Consulting Data Project (5)

This operation has duplicated each zip code for each date that was found in the columns, leading to a dataframe with 3,901,595 rows. It would be possible to use a MultiIndex DataFrame to work with panel data at this point, but using a GroupBy object is the method I will use here. First, the ‘date’ column needs to be converted to datetime, and we need to check for missing values.

This last operation reveals that there are 156,891 missing values in the ‘value’ column, which we will need to keep in mind moving forward. Now, let’s create our GroupBy object, and take our first look at some zip codes.

Real Estate Consulting Data Project (6)

Having our first look at some data, we can see that the individual zip codes have many differences. Just looking at the first 6 out of 14,723, we can see differences in average value and growth rate. We can also see that there is a general trend that they are following, with rising values up until the subprime mortgage crisis between 2007 and 2010 crashed the market, and then a subsequent recovery beginning around 2013. Just looking at these 6, we can see that some zip codes weathered the storm better than others, and this would be an attractive feature for investors who realize that future downturns may be caused by unforeseen macroeconomic effects.

We now have some ideas about what we might look for when identifying the “5 best zip codes” to recommend to an investor: growth rate and resilience to poor economic environment; but there are others to consider. First, one would feel most confident in an investment if the expected rate of growth is complemented by low expected volatility. Second, it would be best to lease the owned properties to renters during the holding period, so that the capital is not just sitting around doing nothing. This means that we should recommend zip codes that not only have strong expected growth, low expected variance, and resilience to poor economic conditions, but that also have a better than average leasing profitability. In order to determine this, we can turn to a metric called Price-to-Rent (P2R) Ratio. This ratio gives us an idea of the proportion between the price to purchase and own a property in a given area, and the amount that renters are willing to pay per year there. This Zillow research article by Jamie Anderson provides more considerations real estate investors should make about the profitability of renting properties, and provides the data which we will import now.

Real Estate Consulting Data Project (7)

We can see that the national average P2R at the time of the article (June 19, 2017, conveniently near the end of our data set) was 11.44. According the the article, the majority of homes in most “major markets” can be rented for a profit, so it stands to reason that real estate investments in any zip code with a P2R below the national average will not only be profitable to the property owner as a rental, but the profitability will be better than the average across the country.

Another rich source of real estate investment advice comes from this Financial Samurai article written by Sam Dogen. In the article, Dogen suggests investors follow what he calls a “key real estate investing rule”: Buy Utility, Rent Luxury (BURL). Dogen explains that luxury properties (especially in coastal cities) tend not to have near the profitability through leasing as utility homes (especially in the Midwest), and often have no leasing profitability at all, where the rent does not cover the yearly costs of ownership. Dogen cites the Zillow research article from above, and uses the P2R ratio to make distinctions between luxury and utility, where he labels everything with a P2R less than 9.6 as utility, and anything with a P2R above 13.3 as luxury. This means that we should be looking for zip codes with P2R below the national average, and hopefully below 9.6 if possible. It stands to reason that since expected rate of capital appreciation is one of our criteria, we may not be able to get all zip codes with P2R below 9.6 since expected growth rate is often reflected in the price of capital, but this gives us a solid target for our contenders. It is also safe to assume that a zip code with homes that would be considered utility would not have an average sale price over $500,000, so this will help narrow our list as well.

Modeling a single zip code

Before we can iterate through all of our zip codes and produce a model for each, we need a good way to quickly generate a well-fit model for a single zip code. Statsmodels gives us great functionality with SARIMAX models, but it does not contain a method to determine the optimal orders for such a model. Grid searching using loops and comparing AIC scores is painfully slow, especially if one wants to consider lags up to, say, 5 or 6 for the AR and MA terms. Luckily, the forecast package for R has an auto.arima function which can quickly find optimal orders and coefficients given a time series. With this, we have the option of either taking the suggested orders and have statsmodels estimate the coefficients, or we could smooth a statsmodels SARIMAX of the suggested order with the given coefficients. It was found during this study that the coefficients estimated by the auto.arima function consistently produced lower AIC scores than those estimated using statsmodels, so we will make our models using the former. To see how we will do this, we first need to look at how we can call an R function from Python using rpy2. We will start by importing the necessary items.

We can now use functions from the forecast package in our Python workflow, but making use of the outputs will take a bit of tinkering. The coefficients and their names are available in vectors attached to the returned object, but the orders are not stored in such a convenient way. Let’s see what a printout of the return from the auto.arima function looks like, using the first zip code in our data set for the test. Note that we must cast the time series as a FloatVector so it is compatible with the R function. Also, take note of how the time series is accessed from the pandas GroubBy object, by getting a group by its name (zip code), setting the date column as the index, then taking the value column. Also note that Python drops the first zeros in these “lower” zip codes because they are numeric values, but we know it is really a 5 digit zip code with a zero on the front. In financial time series analysis, it is general practice to model the returns of a series, rather than the prices, because this gives us a common scale on which to compare all of our zip codes. For mathematical convenience, the log returns are generally favored over percentage increases, since they can be cumulatively summed over time. These are generated by taking the differenced log values of a series. This produces a NaN value on the first date of the series which we will drop. Also, the frequency ‘MS’ is assigned to the resulting returns series as this tells pandas that the data are monthly.

Real Estate Consulting Data Project (8)

Here we can see the return stream of the first zip code. We can see that the series has trends, and the variance is not constant, meaning it is not stationary. The trends can be removed by another order of differencing, but the heteroscedasticity will remain, invariably leading to heteroscedastic errors in our model. Although the residuals will be heteroscedastic, as long as the errors are centered around zero across time, with a roughly normal distribution, and not serially correlated, the model will be mostly effective. The model will estimate parameters based on the log likelihood of the data given the estimated parameters, so what will be affected by the inconsistent volatility will be the estimate of sigma2 (variance) for the model, which will find a happy medium between the lower variance of the early years and the higher variance after around 2009. This is mostly important in our forecasting considerations because the model will generate confidence intervals based on an estimate of variance which is somewhat biased to an outdated market regime with lower variance, leading to narrower confidence bands than might actually be appropriate.

However, despite this issue with forecasted confidence intervals, when comparing the models of many zip codes which are experiencing different degrees of variance swings during and after the crash, there is an advantage to using the entire time series to estimate model parameters, even with heteroscedasticity present, because periods of more intense volatility during the crash will lead to higher estimates of variance in the models fit to zip codes which did not handle the crash well. This means that once all of the models are generated, those with more optimistic outlooks for future variance will be those which were fed data that did not have as large of spikes in volatility; in other words, those that showed resilience during the crash, meaning that the comparisons of the models will still work in our favor to find what we are looking for. One could adjust for the biased sigma2 estimate for forecasting after the fact, changing just this parameter in the smoothing step to a value which is more reflective of the variance in recent years before performing forecasts, but since this is a hypothetical project with plenty to cover already, I will leave this out.

The traditional method for finding the orders for ARMA models is the Box-Jenkins method which uses the ACF and PACF to visually find spikes in autocorrelation. We will be using a more modern approach with the auto.arima function, but it would be interesting to take a look at the ACF and PACF for our first zip code for a visual reference along with the suggested orders. Remember the returns will need an order of differencing, so we apply this before looking at the autocorrelation to be modeled, and drop the leading NaN value.

Real Estate Consulting Data Project (9)

We can see there certainly is some serial correlation in the series, but it is difficult to tell exactly what orders would best model it. It looks like the yearly seasonality is evident at 12 lags, and there is some strong autocorrelation up to lag 5. Let’s see what the object returned from the auto.arima call looks like. Note that when making the time series object using the ts function in R, the frequency is given as 12 to indicate monthly data.

Series: structure(c(-0.00265604405811537, -0.00177462335836864, -0.00266785396119396, -0.00178253166628295, -0.00178571476023492, -0.000894054596880522, -0.000894854645842713, 0, 0.00178890924272324, 0.00178571476023492, 0.00178253166628295, 0.0017793599000786, 0.002663117419484, 0.0017714796483812, 0.0026513493216207, 0.00264433825308963, 0.00263736416608573, 0.00263042676877312, 0.00262352577238545, 0.00261666089117085, 0.0034782643763247, 0.00346620797648711, 0.0025917941074276, 0.00258509407210461, 0.00171969087952739, 0.00171673861905397, 0.000857265376119187, 0.000856531101616653, 0.000855798083895465, 0.0017094021256483, 0.00170648505575954, 0.00170357792478271, 0.00254993763327249, 0.004235499766855, 0.00337553063128126, 0.00336417474563255, 0.00335289501031077, 0.00417537141048108, 0.00332779009267448, 0.00414422474602638, 0.00330305832925681, 0.00329218404347742, 0.00328138112317689, 0.00408664238545242, 0.00407000968829685, 0.00567032706008774, 0.00483482005458313, 0.00401123682645377, 0.00399521106728784, 0.0039793128514809, 0.00396354066245586, 0.00473560474583401, 0.00392927813988919, 0.00469484430420763, 0.00544960476756629, 0.00464756839654612, 0.00616334770766791, 0.0068886609951857, 0.00608366895361456, 0.00604688161489264, 0.00526119214336163, 0.00523365680611043, 0.00520640819057405, 0.00444116199996714, 0.00515654917924557, 0.00513009553102961, 0.00510391191817661, 0.0065241260769433, 0.00719945514285492, 0.00857148105014005, 0.00849863472146239, 0.00842701616188002, 0.00835659459094273, 0.00828734024856992, 0.00958255108099593, 0.00949159668157051, 0.00873367996875452, 0.00932097294306367, 0.0112027530765531, 0.0123739774874423, 0.00902067972593201, 0.00575633185257551, 0.00572338605268641, 0.00632113356336994, 0.00753299230754578, 0.00747667034301891, 0.00680485149838539, 0.0067588582951057, 0.00610502506680355, 0.0072771697738947, 0.00782429510752891, 0.0077635504899245, 0.00711324872451868, 0.0076493459184892, 0.00817284175587396, 0.00925932541279728, 0.0097449897009394, 0.010780246243792, 0.0112234623698484, 0.011650617219976, 0.0120615497338186, 0.0119178008640368, 0.0101795682134629, 0.0100769879503577, 0.0110208410142789, 0.0119326967118454, 0.0128108336717592, 0.0146578312983845, 0.014940516954951, 0.0156942897625889, 0.0149725086303825, 0.01380645591966, 0.012685159527317, 0.0120651115518431, 0.0100964602510096, 0.00682596507039968, 0.0049762599862273, 0.00270392233240102, 0.00134922440076579, 0.000449337235149727, -0.000898876465017295, -0.0013498314760465, -0.00225377602725274, -0.00225886700972744, -0.00181077460070966, -0.000453206443289389, 0, 0.000453206443289389, 0.00181077460070966, 0.00225886700972744, 0.00180342699915137, 0.000900495333249651, -0.0013510472669811, -0.00270758288154482, -0.00452899324870693, -0.00500569873443624, -0.00503088186627743, -0.0045955963233375, -0.00415417167913823, -0.00463607691747825, -0.00605921219926842, -0.00703732672057633, -0.00756147270057639, -0.00809721023262, -0.00623652776946138, -0.00482393603085285, -0.00436152860052985, -0.00389294895540893, -0.00390816325475818, -0.00294117859081666, -0.0024576075284326, -0.00345082915773354, -0.00395844158642866, -0.00347653690108984, -0.00348866538730341, -0.0040020063418531, -0.00351494210744541, -0.00453744161395342, -0.00354341043998474, -0.000507228009860583, 0.00202737018250154, 0.00303336936332954, 0.00252079790656623, -0.00151171608532152, -0.00201918291723047, -0.000505433419908385, 0.00252461633713885, 0.00402617554439999, 0.00300902935161851, -0.00100200409185014, -0.00301205046999264, -0.00706360151874996, -0.00813425939068146, -0.0123268124806586, -0.0135277817381336, -0.00683314091078202, 0.00263365967346196, 0.000525900616906938, -0.00263227317032033, -0.00422610243350618, -0.00424403820047914, -0.00266169973486008, 0.00159786984729493, 0.00318810046864648, -0.00106157122495887, -0.0042575902526405, 0, 0.00159872136369721, -0.00320000273067222, -0.00535619920052355, -0.00215053846322988, -0.00269469308842396, -0.00704037568232785, -0.00873367996875452, -0.00660068403135128, -0.00442478598035656, 0, 0.00442478598035656, 0.00495459312468327, 0.00383667228648576, 0.00327690080231591, 0.00163443239871519, -0.00217983737542049, 0.00109051264896465, 0.00705949171296005, 0.0118344576470033, 0.0085197533447694, 0.00264760546505549, -0.00211752328990755, -0.00318471606752091, -0.00533050302693994, -0.00643433855264597, -0.00431267514794698, -0.00162206037186685, 0.00054097918549445, -0.0021656749124972, 0, 0.0032467560988696, 0.00377460680028641, 0.00322407587175277, 0.00267881221002675, 0.00267165534280522, 0.00266453661509658, 0.00477834729198179, 0.00528263246442684, 0.00525487283835879, 0.00366204960651473, 0.00364868791549178, 0.00104004169541305, -0.0015604684570949, -0.00312826115380993, -0.00261438057407126, 0.00104657257067053, 0.00573665197743445, 0.00673752224774127, 0.00515199490942919, 0.0035906681307285, 0.00306748706786131, 0, 0.00102040825180616, 0.00508648095637376, 0.00455581653586101, 0.00654419652421367, 0.00899106955985651, 0.00496032763096999, 0.000989119764124524, 0.00148184764088199, 0.000493461643371162, -0.000493461643371162, 0.00639608576582695, 0.0102464912091715, 0.0106229873912866, 0.0109864969415678, 0.00945633524203515, 0.00656662772389005, 0.00837993730674924, 0.0115235200038608, 0.00866991513344573, 0.00453309933098467, 0.00271002875886417, 0, 0, 0.00450045764105589, 0.00403316701762257), .Tsp = c(1, 22.9166666666667, 12), class = "ts") 

ARIMA(3,1,1)(0,0,2)[12]

Coefficients:

ar1 ar2 ar3 ma1 sma1 sma2

-0.5119 0.0622 -0.3987 0.9257 -0.4116 -0.3586

s.e. 0.0611 0.0649 0.0578 0.0259 0.0666 0.0703

sigma^2 estimated as 2.524e-06: log likelihood=1320.17

AIC=-2626.35 AICc=-2625.91 BIC=-2601.34

We can see this is a bit of a mess, but everything we need is there. The parameter estimates and their names, as stated before, are conveniently attached to the object as vectors which can be easily accessed. The orders, however, are not conveniently stored, and will be extracted by converting this output into a string and indexing it to get the desired information. Below are two functions, one which can extract the parameters and another which can extract the orders from this object.

Now we have helper functions to extract the information we want from our auto.arima output. There is one snag left to manage, which is that when the auto.arima output has a constant, we need to provide an exogenous variable to the statsmodels SARIMAX object in the form of a vector of ones which the coefficient will be assigned to. Also, creating our auto.arima response object could be much simpler. Rather than generating these each time, we can create yet another helper function to give us everything we need for the creation of our statsmodels SARIMAX model in one step, as below:

We now have everything we need to quickly generate an appropriate model to a time series. Let’s try this on our first zip code’s log returns, to see if it worked:

Real Estate Consulting Data Project (10)
Real Estate Consulting Data Project (11)

We can see this model has some issues but that it is fairly effective. The residuals are heteroscedastic, as expected, and there is some mild autocorrelation in the residuals at lag 5, which is not ideal. However, the residuals are centered around zero, and although the leptokurtocity (excess kurtosis, meaning a slender peak, being caused by the volatility clustering) is causing the JB test null hypothesis of normally distributed residuals to be rejected, they are not too far off, having reasonable skew and kurtosis. Testing this process on a few other zip codes shows that the models generated tend not to have autocorrelated residuals in general, and close-to but not normally distributed residuals, with heteroscedasticity following the same general pattern of increasing volatility over time, and the resulting leptokurtocity. There is much discussion centered on the fact that having perfect SARIMAX model fit for financial time series is not common, and these models are fairly solid considering this domain, especially considering the rapidity with which they are being generated.

Comparing models of all zip codes

We now want to apply this modeling process to all zip codes which have complete data, and which have most recent average sales prices below $500,000. Each model will be used to make a forecast over 5 years, then we will take the average of the predicted mean and the lower confidence limit over the projection period for each model for comparisons. The reason that these two metrics make a good combination for zip code comparisons is that a high mean of expected returns tells us that the model has an optimistic outlook of returns in the future, and when this is paired with higher value for the mean of the lower confidence limit over the same time period, it means that not only does the zip code have a positive outlook for gains, it also has a relatively tight confidence interval, and thus lower expected variance. Zip codes which experienced less intense volatility clustering during the financial crash will have lower estimates for sigma2 in their models, and thus will have tighter confidence bands in their forecasts. This method should lead us to zip codes which are at the top of the list for 3 of our investment criteria: high expected capital appreciation rate, low expected volatility, and resilience to poor economic conditions during the crash. Once we have our contenders, we can then look at their P2R ratios to find those which also satisfy our need for leasing profitability.

Although we have a novel and efficient way to generate the models, 14,723 zip codes is still a lot, and generating a model for each one will take a few hours. The following code will run the appropriate loop to compile a dataframe of the metrics which we will use to select our top zip code candidates.

0 - 1001
(3, 1, 1)(0, 0, 2, 12) model with no constant
1 - 1002
(0, 1, 3)(0, 0, 2, 12) model with no constant
2 - 1005
(2, 1, 1)(0, 0, 2, 12) model with no constant
3 - 1007
(4, 1, 3)(1, 0, 1, 12) model with no constant
4 - 1008
(0, 1, 3)(0, 0, 1, 12) model with no constant
...and so on forever

Excellent. A few short hours later and we have some nice forecast metrics to lead us to superior zip codes for investment. The most informative metric we have taken is the mean of the lower confidence limit throughout the 5 year forecast. There is a lot of information built into this one number, because it can’t be high without having a high mean predicted mean of returns over time as well as a low expected variance. This means that we can sort our results by this metric in descending order to find our top contenders so far, and then include P2R to finalize a list of our top 5 zip codes for investment.

Real Estate Consulting Data Project (12)

Choosing the top 5 zip codes

The results are in, but we need a more descriptive dataframe, and we want the P2R ratios as well. Earlier, we saw that the P2R dataframe has the locations recorded in a ‘Region Name’ column, which is the name of the metro together with the state abbreviation. We can create a similar column from our original dataframe combining the metro and state columns, so we can merge the two dataframes. Let’s limit our results to the top 200, then iterate through the index and create an info dataframe, then merge everything together.

Real Estate Consulting Data Project (13)

Now we have all of the information we need in one place. We can see that there are missing values in the P2R column, so we will not be able to comment on the leasing profitability for these zip codes, and we will focus on those for which we do have P2R info. Let’s filter our results to everything with a P2R below the national average and go from there.

Real Estate Consulting Data Project (14)

Now our search is coming to a close. What is left to do from here is inspect the models for each of these zip codes individually to find the best opportunities for investors. To save a bit of space here, the top two zip codes on this list had historically low volatility, then in the one or two most recent months saw massive unexpected growth rates, the combination of which makes the model forecasts quite optimistic in the short term. However, it would be better to advise the client to invest in zip codes with a little more than a couple of recent months of great returns, so we move on down the list. San Antonio has a great P2R ratio, close to Dogen’s recommendation of 9.6. Since our target P2R is around 10, let’s consider what this means: it takes 10 years of rent payments to pay off the cost of ownership of a property. This means that if an investor holds the property for 10 years and leases it out during that time, they will have paid off their initial investment, and can still sell the properties at a potentially higher value! That gives them a great return on their investment. Let’s look at the model for San Antonio and make forecasts over a period of 10 years.

Real Estate Consulting Data Project (15)
Real Estate Consulting Data Project (16)

We can see this model fits fairly well except for the typical heteroscedasticity to expect from any zip code in this dataset. The residuals are close to a normal distribution, and the residuals are not autocorrelated. Let’s look at a in-sample and out-of-sample 10 year forecast.

Here’s a 10 year forecast in-sample, taking us back to to the middle of the crash, when things were most uncertain.

Real Estate Consulting Data Project (17)

We can see that even though our sigma2 may be a bit underestimated, the confidence bands of our model contained everything following April 2008 apart from the recent spike in growth. Let’s see what the model predicts about the next 10 years.

Real Estate Consulting Data Project (18)

This looks pretty good. The predicted mean is around 2% a month and the confidence bands are mostly above the zero line. The area in the shaded region which falls below the zero line is much smaller than the area above it, giving an investor confidence that they are unlikely to lose money in the next 10 years.

Picking the top 5 zip codes at this point was done by hand by examining individual models for each. The three code blocks above can be adjusted to generate a model and its predictions for any zip code just by changing the zip code in the initial get_group call. In picking the top 5, picking two zip codes in the same city was avoided since diversification is generally a good thing in capital investment, so that in the event that something adversely affects the market in one location, the portfolio is not over-exposed. However, two zip codes from the Dallas/Fort Worth metro were picked, although in different districts, one suburban and one urban. The best 5 zip codes were:

Real Estate Consulting Data Project (19)

All of these zip codes have good forecasts of mean returns, low estimated volatility, and P2R ratios below the national average. Because these zip codes also have the highest average lower confidence interval in their forecasts, they represent the areas least likely to lose value into the future, according to the SARIMAX models. The properties in these zip codes can have the cost of ownership paid off with around 10 years of rent, so that, assuming the home does nothing other than hold its value, the investor can have 100% profit over that time interval, and any capital gains on the properties would be an addition to that. Considering these zip codes have the lowest probability of losing value, they are quite attractive choices for a real estate investor. Let’s look at their price curves together.

Real Estate Consulting Data Project (20)

This gives us a good picture of the resilience of these zip codes to the market crash and the recent growth that we were looking for. Let’s look at the models from the rest of the top 5 zip codes.

Bell Buckle, TN:

Real Estate Consulting Data Project (21)
Real Estate Consulting Data Project (22)
Real Estate Consulting Data Project (23)

Dallas, TX:

Real Estate Consulting Data Project (24)
Real Estate Consulting Data Project (25)
Real Estate Consulting Data Project (26)

Markle, IN:

Real Estate Consulting Data Project (27)
Real Estate Consulting Data Project (28)
Real Estate Consulting Data Project (29)

Flower Mound, TX:

Real Estate Consulting Data Project (30)
Real Estate Consulting Data Project (31)
Real Estate Consulting Data Project (32)

Conclusion

In this project, an imaginary real estate investment firm was to be advised on the top five zip codes to invest in. In order to answer this question, domain knowledge was combined with comparisons of SARIMAX models generated for the return series of each individual zip code. To generate such a large quantity of models in a reasonable time frame, an efficient method to generate well-fitting models for a large number of time series in Python has been demonstrated. By using the strength of the auto.arima function in R’s forecast package within a Python environment, it has been possible to use a Python workflow and statsmodels functionality with appropriate models generated quickly. The zip codes were evaluated by their outlooks on capital appreciation, looking for strong growth with low expected variance. The P2R ratios of the zip codes with the best outlooks were then used to filter the results, and zip codes with both strong outlooks and above average leasing profitability were hand selected for recommendation. Since the target P2R was around 10, 10 year forecasts were generated to see what the model expectations were over a holding period for which the owner could expect to pay off the costs of property ownership. Since the zip codes selected (78210, 37020, 75228, 46770, and 75028) were among the least likely to depreciate over that time period of any of the 14,723 in the data set, they make the best prospects for a real estate investor.

Real Estate Consulting Data Project (2024)

FAQs

What is the identification and analysis of sub markets within larger groups based on their economic demographic and or psychographic characteristics? ›

Market segmentation is the process of dividing the market into subsets of customers who share common characteristics. The four pillars of segmentation marketers use to define their ideal customer profile (ICP) are demographic, psychographic, geographic and behavioral.

What is the objective of a real estate consultant? ›

Real estate consultants are hired to advise on commercial and real estate development projects from start to finish. This may involve researching potential sites for the development, conducting feasibility reports, and running competitive analyses.

What is consulting in real estate? ›

Put simply, real estate consultants are professionals who specialize in helping buyers and sellers make informed decisions about their real estate transactions. For example, they can help conduct a financial analysis for a particular property or assist with strategic planning when trying to build your portfolio.

How do you analyze property data? ›

One of the most common and reliable ways to analyze property value data is to use comparable sales, also known as comps. Comps are the prices of similar properties that have sold recently in the same area as the property you are evaluating.

What is the difference between a real estate agent and a consultant? ›

A real estate agent will sell you a house because you want to buy one. A consultant will ask why you want to buy the house, and help you asses whether it's a wise decision based on your level of financial stability and risk aversion.

What is the difference between a consultant and an agent? ›

When you hire an agency, you enlist them to do marketing on behalf of your company. When you hire a consultant, you enlist them to bring advice and guidance, which you and your team use when executing your marketing.

What is called a consultant? ›

1. : one who consults another. 2. : one who gives professional advice or services : expert. consultantship.

How does consulting pay? ›

Some consultants prefer an hourly rate, while others charge per project or based on a retainer. You can choose whether you want to be paid up front or at certain milestones along the way. There's also a variety of payment methods you can accept, ranging from cash and checks to credit cards and digital wallets.

What happens in a consulting role? ›

Consultants offer advice and expertise to client organisations to help them improve their business performance. Their work can focus on operations, strategy, management, IT, finance, marketing, HR and supply chain management across a wide range of industries and specialisms.

How does consulting work? ›

What Does a Consultant Do? Consultants offer expert research, solutions, and experience to improve business performance. They are problem-solvers who go into a business to offer objective insights and help put suggested strategies into place.

How do I extract data from Zillow? ›

📚 How do I scrape Zillow?
  1. Create a free Apify account using your email.
  2. Open Zillow Search Scraper.
  3. Add one or more Zillow search URLs to scrape available listings.
  4. Click "Start" and wait for the data to be extracted.
  5. Download your data in JSON, XML, CSV, Excel, or HTML.

What is data in real estate? ›

Real estate data is information about properties, their purpose, their value, and their ownership. Its attributes include real estate data include property listings, sales records, rental prices, property characteristics, and market trends.

How do you conduct real estate research? ›

Here are the five simple steps for conducting your own market research and competitive analysis in real estate:
  1. Step 1: Gather the Necessary Real Estate Data. ...
  2. Step 2: Gather Real Estate Comps Data. ...
  3. Step 3: Calculate Your Potential Returns. ...
  4. Step 4: Perform Comparative Analysis. ...
  5. Step 5: Study the Competition.
Oct 17, 2022

What is general data in real estate? ›

Specific data, covering details regarding the particular property, and general data, pertaining to the nation, region, city, and neighborhood wherein the property is located, are collected and analyzed to arrive at a value. Appraisals use three basic approaches to determine a property's value.

Top Articles
Latest Posts
Article information

Author: Wyatt Volkman LLD

Last Updated:

Views: 5812

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Wyatt Volkman LLD

Birthday: 1992-02-16

Address: Suite 851 78549 Lubowitz Well, Wardside, TX 98080-8615

Phone: +67618977178100

Job: Manufacturing Director

Hobby: Running, Mountaineering, Inline skating, Writing, Baton twirling, Computer programming, Stone skipping

Introduction: My name is Wyatt Volkman LLD, I am a handsome, rich, comfortable, lively, zealous, graceful, gifted person who loves writing and wants to share my knowledge and understanding with you.